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Abstract. Statistical analyses of finite sample distributions usually assume that fluctuations are self-averaging, 
I i.e. statistically similar in different regions of the given sample volume. By using the scale-length method, we test 

^ |. whether this assumption is satisfied in several samples of the Sloan Digital Sky Survey Data Release Six. We find 

that the probabihty density function (PDF) of conditional fiuctuations, if filtered on large enough spatial scales 
(i.e., r > 30 Mpc/h), shows relevant systematic variations in different subvolumes of the survey. Instead for scales 
of r < 30 Mpc/h, the PDF is statistically stable, and its first moment presents scaling behavior with a negative 
' exponent around one. Thus while up to 30 Mpc/h galaxy structures have well-defined power-law correlations, on 

larger scales it is not possible to consider whole sample average quantities as meaningful and useful statistical 
descriptors. This situation stems from galaxy structures corresponding to density fluctuations that are too large 
in amplitude and too extended in space to be self-averaging on such large scales inside the sample volumes: galaxy 
distribution is inhomogeneous up to the largest scales, i.e. r « 100 Mpc/h probed by the SDSS samples. We show 
that cosmological corrections, such as K-corrections and standard evolutionary corrections, do not qualitatively 
change the relevant behaviors. We consider in detail the relation between several statistical measurements generally 
used to quantify galaxy fluctuations and the scale-length analysis by discussing how the breaking of self-averaging 
properties makes it impossible a reliable estimation of average fluctuations amplitude, variance, and correlations 
for r > 30 Mpc/h. Finally we show that the large-amplitude galaxy fluctuations observed in the SDSS samples 
are at odds with the predictions of the standard ACDM model of structure formation. 



o 



(N 

> 

m 



o 

O 



X 



Key words. Cosmology: observations; large-scale structure of Universe; 



' 1- Introduction observations with theoretical predictions and cosmologi- 
5^ , cal N-body simulations 0. In this way one is not able to 
The statistical characterization of galaxy structures repre- properly disentangle the different problems and to ask the 
sents a central problem for our understanding of the large- relevant questions at each step. For this reason, in what 
scale universe. Once three-dimensional galaxy samples are follows we try to discuss the three issues above by con- 
provided by observations, one may think the problem is sidering each in turn. In particular, only when there is 
relatively simple; i.e., all that remains to do is to charac- agreement about the statistical methods used will it be 
terize the statistical properties of N points (galaxies) con- possible to compare clearly results from different authors 
tained in a volume V. However, there are several issues and to isolate the problems related to cosmological cor- 
that must be considered with great care; namely, (i) the rections and/or sampling. 

definition of the statistical methods employed and analysis There has been an intense debate about the 

of the assumptions imphcitly used by them; (ii) construe- most suitable statistical methods for characterizing 

tion of the samples and the consideration of cosmologi- — 

1 ,. /...\ • £ Ii • 1 J. In general particles in cosmological N-body simulations 

cal corrections; (ni) comparison ot results m galaxy cat- , ^ , 

, .,, , 1 ,. , 1 r .1 are supposed to represent a coarse grained distribution ot the 

alogs with model predictions. iLven though each oi these . • > i ^- , i\t u j j i 

° ° microscopic dark matter particles. I'rom the N-body dark mat- 

issues requires a separate discussion, sometimes in the lit- p^^icles one constructs the galaxy density field, by using 

erature the reliability of statistical methods is hidden by certain procedures, which can be generally thought of as a 

the problems related to cosmological corrections and/or sampling mechanism. The key element of this selection is that 

by sampling (or biasing) of a given distribution, which is galaxies are supposed to form on the highest density peaks of 

the problem to be considered when comparing results of the underlying dark matter field. 
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galaxy properties , particularly galaxy correlatio n s (see 



Pietronerol 19871; Davisl 


19971 Pietronero et al.. 1997 


Svlos Labini et al.. 19981; Jovce et al.. 1999a 


; Wu et al. 


1999 


: iGabrielli and Svlos Labini. 200ll 


Hoee et al. 


2004 


; Jovce etal.. 2005c Barvshev & Teerikorui, 2006 


Vasilvev et al.l. 20061 ISvlos Labini et al.l.l2007l 2009aib dV 



The most suitable statistical method for characterizing 
the properties of a given stochastic point process depends 
on the underlying correlations of the point distribution 
itself. There can be different situations for the statistical 
properties of any set of points (in the present case, 
galaxies) in a finite sample. Let us briefly consider 
four different cases. Inside a given sample the galaxy 
distribution is approximated by a uniform stochastic 
point process, or in other words, inside a given sample 
the average density is well-defined. This means that the 
density, measured for instance in a sphere of radius r 
randomly placed inside the sample, has small fiuctuations. 
In this situation the relative fluctuations between the 
average density estimator and the "true" density is less 
than unity. Density fiuctuations may be correlated, and 
the correlation function can be (i) short-ranged (e.g., 
exponential decay) or (ii) long-ranged (e.g., power law). 
In other words these two cases correspond to a uniform 
stochastic point process with (i) short-range and (ii) 
long-range correlations. 

On the other hand, it may happen that galaxy dis- 
tribution is not uniform. In this situation, the density 
measured, for instance, in a sphere of radius r randomly 
placed inside the sample, has large fiuctuations; i.e., it 
varies wildly in different regions of the sample. In this sit- 
uation the point distribution can generally present long- 
range correlations of large amplitude. Then it may, case 
(Hi), or may not, case (iv), present self-averaging proper- 
ties, depending on whether measurements of the density 
in different subregions show systematic (i.e., not statisti- 
cal) differences that depend, for instance, on the spatial 
positions of the specific subregions. When this is so, the 
considered statistics are not statistically self-averaging in 
space because the PDF systematically differs in different 
subregions and whole-sample average values are not mean- 
ingful descriptors. In general, such systematic differences 
may be related to two different possibilities: (i) the under- 
lying distribution is not translationally and/or rotation- 
ally invariant, (ii) the volumes considered are not large 
enough for ffuctuations to be self-averaging. 

In determining statistical properties, a fundamental as- 
sumption is very often used in the finite-sample analysis: 
that sample density is supposed to provide a reliable esti- 
mate of the "true" space density, i.e., that the point distri- 
bution is well-represented by cases (i) or (ii) above. This 
corresponds to the assumption the relative fiuctuations be- 
tween the average density estimator and the "true" den- 
sity are smaller than unity. In general, this is a very strong 
assumption that may lead to underestimating finite size 
effects in the statistical analysis. 

For instance, let us suppose that the distribution in- 
side the given sample is not uniform, i.e. cases (iii) and 



(iv) above. In this case the results of the statistical anal- 
ysis are biased by important finite-size effects, so that all 
estimations of statistical quantities based on the unifor- 
mity assumption (i.e. the two-point correlation function 
and all quantities normalized to the sample average) are 
affected, on all scales, by this a-priori a ssumption that 



is inc onsistent with the data properties (jGabrielli et al 



20051 ). In addition, while for case (iii) one may consider a 
class of whole sample-averaged quantities, i.e. conditional 
statistics in case (iv) these become meaningless. 

For this reason, our first aim is to study whether galaxy 
distribution is self-averaging by characterizing conditional 
fluctuations. If the distribution is self-averaging, then one 
can consider a whole-sample average quantity and study 
the possible transition from non-uniformity to uniformity 
by characterizing the behavior of, for instance, the condi- 
tional density. If the distribution is uniform, or becomes 
uniform on a certain scale smaller than the sample size, 
one can characterize the (residual) correlations between 
density fluctuations by studying the standard two-point 
correlation function. Therefore the consideration of ^(r) 
is the last point on this list, and it is appropriate only if 
one has proved that the distribution is self-averaging and 
uniform inside the given sample. 

These issues are relevant in studies of the galaxy 
distribution because in the past twenty years it 
is has been observed that galaxy structures are 
organized in a complex network of clusters, flla- 
ment s, and voids o n scale s up to hundreds of Mpc 
(see 



Kirshner et al.. 1983: Geller and Huchra 



'Broadhurst et al T ~ 1990 : Giova nclli an d Havne 

iGott el al. . 2005riEinasto et al. 



1988; 



1993 



,2006a .b). From the sta- 
tistical point of view, the problem is whether these struc- 
tures are compatible with the very small characteristic 
length scale of the galaxy distribution of about ten Mpc. 
This is the scale at which the two-point correlation func- 
tion is equal to unity, and it has been measured to be 
in the range of 5-15 Mpc/h in different (angul a r and 



three-dimensional) catalogs (ITotsuii and Kiharal. I196E : 
Davis an d Peebles!. Il983t iDavis et all . Il988'; 'Par k et al 
1994; .Benoist et al. 



19961 : iNorberg et a l.. 200 ll. 12002 



Zehavi et al.l l2002l 1200511 The essence of the problem is 



not whether these measurements have been properly made 
as indeed they have been, but other whether the statisti- 
cal methods used to get this result are consistent with the 
properties of the galaxy dis tribution in these sam ples (see 
iGabrielh et al.. . .2005., : Svlos Labini et al.l . l2009cl) . 

By measuring the redshift-dependent luminosity func- 
tion and the comoving radial density of galaxies in the 
Sloan Digital Sky Survey (SDSS) Data Release 1 (DRl), 
it has been found that the apparent number density of 
bright galaxies increases by a factor « 3 as redshift in- 
creases from z = to z = 0.3 (.Lovedav , .2004 ) . To explain 
these observations, a significant evolution in the luminos- 
ity and/or number density of galaxies at redshifts z < 0.3 



^ Conditional statistics are not normalized to the sample 
density estimation (see Sect. 2). 
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has then been proposed ( Lovedav . 2004 ). However, an in- 
dependent test has not been provided to support such a 
conclusion; in particular, the possible effect of large den- 
sity fluctuations on the basic assumptions used in this 
analysis (i.e. large-scale uniformity of the density field) 
was not tested, although it was noticed that these results 
do not preclude significant density fiuctuations in the lo- 
cal universe on very large scales. In what follows, we will 
carefully consider these results and present a different con- 
clusion for these observations, namely that galaxy cluster- 
ing on very large scales is certainly making an important 
contribution to the observed behaviors of galaxy counts. 



Regardless of the origin of the big change in the spa- 
tial density found by iLovedavl (|2004), we note that the 
density varies by a factor three within the given sam- 
ple implies that it is meaningless to derive amplitudes 
of fluctuations with respect to this quantity. Indeed, in 
this situation the estimation of the amplitude of fluc- 
tuations normalized to the sample density is biased by 
systematic effects, and whole sample-averaged quantities, 
such as the two-point correlation function and the power- 
spectrum, are not meaningful and stable statistical de- 
scriptors. Another question we address here in more de- 
tail concerns the ph ysical orig i n of t he density growth. As 
mentioned, while in iLovedavl (|2004f ) it is concluded that 
the density growth comes from evolution leaving, however, 
open the question of the contribution of large scale struc- 
tures, we concluded that thi s stems from large-scale fluc- 
tuations (jSvlos Labini et al.l . l2009(J). Here we show that, 
if relevant on such low redshifts, galaxy evolution is not 
the main cause of the measured behaviors. This result is 
reached by performing several specific tests that include 
some rough determinations of the effect of e volution as in 
Blanton et aP (|2003h : IXegmark et al.l (|2004h . 



The paper is organized as follows. In SectH] we give 
a brief overview of our statistical methods, stressing the 
role of assumptions and the properties of conditional and 
unconditional fluctuations. Then in Sectl3]we discuss the 
procedure used for selecting the data from the SDSS-DR6 



( Adelman-McCarthv et al 



archive and the vari- 
ous corrections applied to constructing the samples used 
in the analysis. In Sect|4] we discuss the main results of 
the statistical analysis we considered, that concerns the 
study of conditional fluctuations in the SDSS samples and 
their PDF. Then, in SectO we compare the conditional 
fluctuations in the real galaxy samples with the predic- 
tions of theoretical models and with those measured in 
mock galaxy catalogs constructed from cosmological N- 
body simulations. These are the outcome of gravitational 
N-body simulations of a conco rdance model, i. e. a A Cold 



Dark Matter (CDM) model (|Springel et al.l . 120051 ). and 



represent the predictions of theoretical mod els for the cor 



2. Overview of statistical methods 

There are several a priori assumptions that are gener- 
ally used in statistical studies of g alaxy samples and tha t 



require detailed consideration (see iGabrielH et al.U2005l ). 



Galaxy distribution is considered to be a realization of a 
stationary stochastic point process. This means that it is 
assumed to be statistically translationally, and rotation- 
ally invariant, thereby satisfying the conditions of statis- 
tical isotropy and homogeneity in order to avoid special 
points or directions. These conditions are enough to sat- 
isfy the Copernican principle, i.e., that there are no spe- 
cial points or directions; however they do not imply spa- 
tial homogeneity. Indeed an inhomogeneous distribution 
can satisfy the Copernican principle even though th is is 
characterized by large voids and structures (Svlos Labinil . 
1994 IJovce et al.l . l2000t ICabrielli et all . I2OO5I 1 



2.1. A brief summary of the statistical properties 

We now briefly discuss several properties of stochastic 
point processes (SPP) that are useful in the rest of the 
paper (jCabrieUi et al.il2005f) . 



relati on properties of non-linear structures (jCroton et al 



2006( ). Finally Sect l6] we draw our main conclusions 



— A stationary SPP (SSPP) satisfies the conditions for 
a statistically translational and rotational invariant. It 
can be uniform (spatially homogeneous) or nonuniform 
(spatially inhomogeneous). 

— An SSPP is ergodic if the ensemble average of a sta- 
tistical quantity characterizing its properties equals its 
infinite volume average. In a finite volume, only volume 
averages determinations are defined (i.e. estimations of 
statistical quantities). The ergodicity of an SSPP is a 
necessary assumption when one wants to compare vol- 
ume average quantities with theoretical predictions. 

Let p{r) be a microscopic density function, that is, a 
realization of a given stochastic process. A stochastic pro- 
cess is ergodic if a generic observable macroscopic variable 
F — F{p{ri), p(t"2), ...) satisfies the following relation: the 
average over an ensemble of realizations (F) is equal to the 
spatial average F defined by 

F= \im ^ [ F{p{ri+r),p{r2 + r),...)(fr. (1) 

V-^oo V Jy 

When V in Eq[T] is finite, then _F is a statistical estima- 
tor of {F) in a given sample. Therefore the assumption of 
ergodicity is necessary if we want to use a statistical esti- 
mator to verify a theoretical prediction, which is expressed 
in terms of ensemble averages. 

— An SSPP is uniform if, in a finite but large enough 
sample, fluctuations in the density are small enough. 
For instance, the scale Ao at which an SSPP becomes 
uniform can be defined to be scale beyond which the 
fluctuations on the average density filtered on that 
scale are of the same order of the average density it- 
self, and then they are smaller on larger scales. To test 
whether an SSPP is uniform one can use conditional 
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properties, which are defined also when the SSPP is 
not uniform. 

A uniform SSPP inside a given sample has a well- 
defined average density, i.e. the sample determination 
is representative of the ensemble value within some rel- 
ative small errors. Alternatively the amplitude of the 
two-point correlation function is small enough on large 
scales to guarantee that positive average density exists. 
This is, however, a necessary but not a sufficient con- 
dition, as the amplitude of estimator of this function 
can be small also for a non-uniform distribution in a 
finite sample. In the latter case however the amplitude 
is not a significant statistical measurement. 
An SSPP has a well-defined crossover to homogeneity, 
if it is nonuniform on scales smaller than Aq and uni- 
form on larger scales Aq . The length scale Aq marks the 
transition from the regime from large to small fluctu- 
ations. At scales r > Aq one-point statistical proper- 
ties (i.e. unconditional properties) are well defined. To 
study the approach to uniformity one should consider 
conditional properties. 

A uniform SSPP can have long range correlations, 
i.e. characterized by a non-zero two-point correlation 
function at all scales. This latter case describes the 
case of an LCDM model, which is indeed character- 



a system is said to exhibit self-averaging if 
(|Aharonv and Harrisl [l996h Fl 



lim 



AF2, 







In such a case, a single large system is enough to rep- 
resent the whole ensemble. When there are long-range 
correlations, the property of self-averaging is non trivial 
as self-averaging requires the size L of the sample to be 
large r than the range of correlations ( Aharonv and Harrisl 
19961 ) . The concepts of ergodicity and self-averaging refer 
to two different properties of a stochastic process; namely 
ergodicity of the variable F implies Eq[T] while the self- 
averaging property has to be ascribed to the ensemble 
variable Fl, which is determined in a finite sample. 

Finally it is worth noticing that, if the distribution is 
uniform, for the cases in which correlations are both short 
or long ranged, any global (spatially averaged) observable 
of the system has Gaussian- type fluctuations, in agree- 
ment with the central limit theorem. When there are long- 
range correlations of large amplitude the central limit the- 
orem does not hold and fluctuations in glob al quantities 



usually have non-Gaussian fluctuations (see lAntal et al 
((2009i) for a more detailed discussion). 



ized b y large scale super-homogeneity (.GabricUi et al 



2002h . A system can be uniform and, at the same time, 2.2. A toy model 



long-range correlated only if the amplitude of the two- 
point correlation function ^ (r) is small enough on large 
scales. 

~ The range of correlations for a uniform SSPP is mea- 
sured by the functional behavior of the two-point cor- 
relation function £,{r). If the system has critical corre- 
lations, (^(r) is a power-law function of distance. 

— An SSPP is nonuniform (or spatially inhomogeneous), 
inside a given sample, if the conditional density does 
not converge to a constant value. If the distribution 
is self- averaging (see below) and nonuniform then the 
conditional density is a varying function of the dis- 
tance. When this does not change anymore function of 
distance, the distribution uniform. 

— To test whether a nonuniform SSPP is self-averaging 
in a finite volume and on a certain scale r, one may 
study the PDF of conditional fluctuations. If this is not 
statistically stable in different subvolumes of linear size 
r, then the self-averaging property is not satisfied. 

The self-averaging property is closely related to ergod- 
icity. In a volume of linear size L, any observable F has 
different values for different realizations of the random- 
ness (i.e. of the stochastic process) and is thus a stochas- 
tic variable described by a PDF P{F, L). By denoting the 
average 

Fl= I FP{F, L)dF 



To further clarify the concepts previously illustrated, we 
discuss a simple toy model. We generate a stochastic point 
distribution as follows. We distribute randomly in two 
dimensional Euclidean space rectangular sticks and the 
points within each stick. The center of each stick and its 
orientation are chosen randomly in a box of side L (for 
simplicity we fix L = 1). The points of each stick are 
placed randomly within its area, which for simplicity we 
take to be £ X t/li); these points have constant density 
within each stick. The length scale I can vary as can the 
number of sticks placed in the box. Different realizations 
of this toy model are shown in Fig[TJ The conditional av- 
erage density (see EqU] below), i.e. the average density 
computed in spheres whose center is a distribution point, 
is shown in Figl2] and the PDF of the conditional density 
in FigEl 

By taking the dimension I of the sticks (in this case 
equal for all sticks) small enough and the number of sticks 
large enough, one has generated a uniform distribution 
with positive correlations, i.e. ^(r) > 0, on small scales 
(FiglHupper left panel — model Tl). In this case the av- 
erage conditional density (Figl2l) rapidly decays to a con- 
stant value, and the PDF of ffuctuations (FigE]) is approx- 
imated by a Gaussian function. When the dimension € of 
the sticks is increased and their number still large enough, 
then the distribution is still uniform, but it is positively 
correlated on larger scales. In the example shown in Fig[l] 



and variance 



^F^L = I F^P{F,L)dF-Fl 



Equivalently, if the PDF P{F, L) tends to a Dirac's delta 
function for L ^ oo then the system is said to exhibit self- 
averaging properties. 
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(upper right panel — model T2) the dimension of the 
sticks is about the box side, making it a uniform distri- 
bution with (weak) correlations extending up to the box 
size. In such a situation, the average conditional density 
reaches a constant value on a scale (the homogeneity scale) 
comparable to the box size (Figl^]). Correspondingly, the 
PDF is Gaussian only when fluctuations are filtered on 
scales comparable to the homogeneity scale (Figl3]). 

We can then increase the dimension of the sticks fur- 
ther and decrease their number (FiglU bottom left panel 
— model T3). In this case the distribution is not uni- 
form, as there are holes as large as the sample. The den- 
sity thus presents large fluctuations and it is not a well- 
defined quantity on the sample scale. This is clearly a 
positive correlated distribution, with long-range correla- 
tions (up to the sample size in this case) of large ampli- 
tude. This is shown by the behavior of the average condi- 
tional density (Figl2|), which does not converge to a con- 
stant value inside the box. Therefore this is not a uniform 
distribution; indeed, the PDF of fluctuations (FigEl) , fil- 
tered on large enough spatial scales, does not converge to 
a Gaussian function. To show whether the distribution is 
self-averaging inside the simulation box, one may compare 
the full PDF with the ones measured in two half parts of 
the box. One may see from FigOthat, although there are 
differences, the shape of the PDF is similar in the two 
subsamples. Particularly, the peak and the width of the 
three PDF are approximately the same. 

Finally we can take sticks with different I. In the ex- 
ample shown in FiglT] (bottom right panel — model T4), 
this is the same for all but for a single stick that has a t 
larger than the sample size. As for the previous case this is 
a strongly correlated distribution, which is not uniform in- 
side the box. Indeed, the average conditional density does 
not flatten inside the box (Fig[2]). In addition, this dis- 
tribution is not self-averaging. Indeed, by measuring the 
PDF of conditional fluctuations in different regions of the 
sample (say the upper and the bottom parts — see Figl3|), 
one finds systematic (i.e., not statistical) differences. This 
is an effect of the strong correlations extending well over 
the size of the sample. 

A quantitative measurement of the breaking of the self- 
averaging property is represented, for instance, by deter- 
mining the first and second moment of the PDF and by 
checking whether they are stable in different subregions 
of the samples. One may note from Fig[3] that, for the 
model T4, both the peak and the width of the PDF are 
different when measured in different sample subregions or 
in the whole sample box, thus indicating the breaking of 
self-averaging. 




Fig. 2. Conditional density for the toy models shown in 
Figdl The case of a Poisson point distribution is added as 
a reference. (The conditional density has been normalized 
to the number of points in the simulations.) The model 
TI has a short-range correlation, which corresponds to a 
fast decay of n(r). The model T2 is still uniform on large 
scales, i.e. n(r) is flat. The models T3 and T4 have strong 
clustering up to the box size. 

statistical quantities that are normalized to the sample 
average density and those that are not. Given that the 
primary scope of our study is to determine whether a 
statistically meaningful estimate of the average density 
is possible in the given samples, we mainly use statistical 
quantities that do not require the assumption of homo- 
geneity inside the sample and thus avoid the normalization 
of fluctuations to the estimation of the sample average. 
These are thus conditional quantities, such as the condi- 
tional density ni(r) from the i^^ galaxy, which gives the 
density in a sphere of radius r centered on the i*'' galaxy. 
Conditional quantities are well-defined both in the case of 
homogeneous and inhomogeneous point distributions. If a 
distribution is self-averaging inside a given sample or in 
the range of scales where such a property is found to hold, 
then it is possible to consider the whole sample average of 
the conditional density, which is determined by computing 



I 



M 



(2) 



with respect to the i = 1, M galaxies contained in the 
given sample. When a distribution is nonuniform (i.e. in- 
homogeneous), the conditional variance, which quantifies 
the amplitude of conditional fluctuations, is such that 



2.3. Strategy for a statistical analysis of a 
finite-sample distribution 

In a finite sample we need to set up a strategy for testing 
the different assumptions used in the statistical analysis. 
To this aim we have to make a clear distinction between 



5{r 



l2 - 



rt(r)2 — n{r) 



n{r) 



o{i), 



(3) 



where the last eq uality corresponds to the f luctua tions 
being persistent (jGabrielli and Svlos Labinil . 120011 ). On 
the other hand for homogeneous distributions, with 
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Fig. 1. Four different realizations of the toy model discussed in the text. Upper-left panel: uniform distribution 
with short-range positive correlations (TO). Upper- right panel: uniform distribution with long-range positive correla- 
tions (Tl). Bottom-left panel: nonuniform distribution with long-range positive correlations (T3) Bottom-right panel: 
nonuniform distribution with long-range positive correlations and non self-averaging properties (T4). 



any kind of small-amplitude co rrelations we find that 
( Gabriein and Svlos Labinil . l200l[ ) 



5{rY « 1 . 



To test whether a distribution is self-averaging inside 
a given sample one may measure the PDF of conditional 
fluctuations and determine whether this is stable in dif- 
ferent subregions of the given sample. Only when the sta- 
tistical self-averaging property is satisfied may one con- 
sider determining whole-sample average quantities. Then 
only if the conditional density is roughly constant inside 
a given sample, and thus the distribution in that sample 
is approximately uniform, may one consider determining 
fluctuations in amplitude and their correlations normal- 
ized to the sample density. 



Quantities like the two-point correlation func- 
tio n, whose estimat or can generally be written as 



(iGabrielli et all , mm 



^, , nir) 

(4) CM- — -1 



ns 

measure the correlation amplitude of fluctuations with 
respect to determining the sample average tis 0- When 
the distribution is nonuniform the estimation of the 
sample average is ill-defined, even if the distribution 
is self-averaging inside the sample volume, resulting 
in systematic effects in determining the estimator ^(r) 



^ We remind the reader that the previous equation is also 
valid in the infinite volume limit. The various estimators that 
can be found in the literature use different methods to treat 
boundary conditions, hence to estimate both the nominator 
and t he denominator of the previous relation l|Gabrielli et al.l . 
120051 '). 
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Fig. 3. PDF of conditional fluctuations (black line) fil- 
tered at 1/10 of the sample size (i.e., r = 0.1) for the 
toy models shown in Fig[TJ Also shown is the PDF com- 
puted in two parts of the box, i.e. for y > 0.5 (red line) 
and y < 0.5 (green line). Both the models Tl and T2 ap- 
proach to a Gaussian distribution (blue dashed lines), as 
these distributions are uniform although correlated. The 
PDF of the model T3 does not approach a Gaussian func- 
tion but it is self-averaging inside the box. Finally the 
PDF of the model T4 is not Gaussian and it does not 
show self-averaging properties. 



(iGabrielli et al.l . l2005h . Thus unconditional quantities are 
only well-defined for uniform distributions. 



3. The samples 

The SDSS (jYork et al.l . l2000[) is currently the largest spec- 
troscopic survey of extragalactic objects, and here we con- 
sider the data from the publ i c data release six (SDSS DR6) 
(|Adelman-McCarthv et a"D . l2008h El containing redshifts 
for about 800,000 galaxies and 100,000 quasars. There 
are two independent parts of the galaxy survey in the 
SDSS: the main galaxy (MG) sample and the luminous 
red galaxy sample. We only discuss the former. The spec- 
troscopic survey covers an area of 7425 square degrees on 
the celestial sphere. The Petrosian apparent magnitude 
limit with extinction corrections for the galaxies is 17.77 
in the r-filter and photometry for each galaxy is available 
in five different bands. A detailed discussion of the spec- 
troscopic taxget^ekcdonjji the SDSS MG sample can be 



3.1. The query from the SDSS database 

We used the following criteria to query the SDSS 
DR6 database , in particular from the SpecPhoto v iew 



(jStrauss et all . 120021 : lAdelman-McCarthv et all . l2008f ): 



Region 








^max 


n (sr.) 


Rl 


-6.0 


36.0 


-48.0 


32.5 


0.94 


R2 


-33.5 


-16.5 


-54.0 


-17.0 


0.15 


R3 


-36.0 


-26.5 


-14.0 


43.0 


0.15 



Table 1. Properties of the angular regions considered. 



— We constrained the flags indicating the type of object 
so that we select only the galaxies from the MG sam- 
ple, i.e. (specClass = 2 and (primTarget & 64) > 
or (primTarget & 128) > or (primTarget & 
256) > 0). 

— We constrained the redshift confldence parame- 
ter to be Zconf > 0.35 with flags indicat- 
ing no significant redshift determination errors, 
i.e. zConf > 0.35 AND zWarning & 193821 =0 AND 
NOT zStatus IN (0, 1, 2). 

— We then considered galaxies in the redshift range 
10""* < z < 0.3, i.e. z >= 0.0001 AND z<= 0.3. 
Given the low value of the lower redshift limit, nearby 
galaxies that are large enough may get "shredded" 
into smaller pieces by the SDSS automatic pipelines 
and may represent an unwanted contamination to the 
data. However, these are excluded by considering sam- 
ples that have not a too low redshift limit and that do 
not contain extremely bright galaxies (see below). 

— We applied the flltering condition for Petrosian ap- 
parent magnitudes with extinction corrections r < 
17.77, thus taking the target magnitude limit for 
the MG sample in the SDSS DR6 into account, i.e. 
(petroMag_r - extinction_r) < 17.77. 

In this way we selected 525,813 galaxies. 
3.2. The angular regions 

We use the internal angular coordinates of the survey 
(A, ry), which can be transformed into the usual equato- 
rial angular coordinates by a simple rotation. The angu- 
lar coverage of the survey is not uniform, but observations 
have been done in different disconnected sky regions. For 
this reason we have considered three rectangular angular 
regions in the SDSS internal angular coordinates in this 
way we did not have to consider the irregular boundaries 
of the survey mask, as we cut such boundaries to avoid 
uneven edges of observed regions. In Table [T] we report 
the limits of the cuts are chosen using the internal coor- 
dinates of the survey rj and A (in degrees) and the sample 
solid angle 17 in steradians. We did not use corrections 
for the redshift completeness mask or for fiber collision 
effects. Completeness varies most near the current survey 
edges, which are excluded in our samples. Fiber collisions 
in general do not present a pr oblem for measureme nts of 

[200I . 



see www . sdss . org 



large-scale galaxy correlations (jStrauss et al. 

Let us add a comment on incompleteness, which we 
concluded does not play a major role in out results. This 
conclusion is reached by considering several fact. 
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(i) All statistical quantities we measured, such as 
counts of galaxies as a function of apparent magnitude, the 
redshift distribution in the magnitude-limed sample, and 
the measurements of the correlation function in volume- 
limited samples, agree very well with previous works that 
have taken into account the variation in complete ness in 



the whole survey area (jZehavi et all . I2OO2I . l2005f) . This 



implies that there are no major differences in the way we 
treated the data, while there is a substantial difference in 
the interpretation of the results of the statistical analysis, 
as we discuss below. 

(ii) Some authors use the method of making a random 
catalog with the same selection function of the real sam- 
ple, and to this aim the detailed information given by the 
survey completeness mask is used. The completeness mask 
takes (mainly) into account that the limiting magnitude 
has small variations in different fields, so that some galax- 
ies have not been observed and so that a small fraction 
of galaxies in the photometric catalog have not been ob- 
served. There is not a way, that is free of a-priori assump- 
tions, to correct for such an incompleteness. Given that the 
detailed information of the real galaxy distribution is un- 
known one has to make some assumptions on the statisti- 
cal properties of such a distribution. On the other hand, a 
way of checking the possible effects of incompleteness free 
of assumptions, is to limit the selection of galaxies to more 
stringent limits in apparent magnitude, especially for faint 
magnitudes. By limiting the apparent magnitude to 17.5 
instead of 17.77, wc found no statistical difference from 



the re sults presented in what follows () Sylos Labini et al 



2009g). A similar conclusion on the survey incompleteness 
has been found in the two-degree held gala xy redshift sur- 
vey (2dFGRS) (ISvlos Labini et al.l.l2009l| ) In a ddition it 
has been shown by Cabre and Gaztaiiagal ( 20091 ) that the 
completeness mask could be the main source of systematic 
effects only on small scales, while we are interested in the 
correlation function on relatively large separations. 

3.3. The volume-limited samples 

To construct volume limited (VL) samples that are unbi- 
ased for the selection effect related to the cuts in the ap- 
parent magn i tude, we applied a standard procedure (see 



Zehavi et al.l. 20051 ). Firstly we computed metric distances 



as (lHoggi ll99^ 



i?(z;Sl,„,fiA) = 



dy 



(5) 



where we used the cosmological parameters Vim = 0.3 and 
Vl\ = 0.7 for the concordance model. We checked that 
our results do not depend significantly on the choice of 
cosmological parameters when taken in a reasonable range 
of values. This is expected since the redshift involved in 
these studies is limited to z < 0.2 and relativistic redshift- 
distance corrections are generally small and linear to the 
redshift for z < 1. 



VL sample 


iimin (Mpc/h) 


Rmax (Mpc/h) 


M ■ 




VLl 


50 


200 


-18.9 


-21.1 


VL2 


100 


300 


-19.9 


-22.0 


VL3 


125 


400 


-20.5 


-22.2 


VL4 


150 


500 


-21.1 


-22.4 


VL5 


200 


600 


-21.6 


-22.8 



Table 2. Main properties of the obtained VL samples with 
K-corrections and without E-corrections. 



Second, the galaxy absolute ni agnitude was deter- 
mined to be (see Zehavi et al.L l2005f ) 



Mr = - 5 logio [R{z){l + z)] - Kr{z) - 25 , 



(6) 



where Kr{z) is the K-correction. 

We deter mined the Kr (z) term from the NYU 

VACG datall (jBlanton et al.Ll2005h : to calculate the K- 
correc tion, a template fit t o obs erved galaxy fluxes was 
used ( Blanton and Roweid . 2007 ). To match these data 
with the results of the query form SDSS data archive 
we applied the following criteria: (i) right ascension and 
declination must match within 1 arc-sec; (ii) relative dif- 
ference between redshifts should be less than 1%. With 
these constraints we find 517,729 galaxies which success- 
fully matched and 8084 galaxies which arc not matched. 
For unmatched galaxies we considered a polynomial ap- 
proximation to Kr{z) 



Kr{z) = Oq + fliZ + a2Z^ 



(7) 



where ao = 0.006, ai = 0.847 and 02 = 1.232. The be- 
havior of Eq[7] corresponds to the average K-correction of 
matched galaxies. 

As discussed above, the MG sample corresponds to 
the observations, in a certain sky area, of all galaxies with 
apparent magnitude in a given range. There is thus an 
intrinsic selection effect because faint galaxies can only 
be observed if they are close enough to us, while brighter 
galaxies can be observed both at low and high redshift. 
Thus to avoid this observational selection effect, a VL 
sample is defined by two cuts in distance and two in ab- 
solute magnitude, so that it covers a rect angular area in 
the M - z diagram (| Zehavi et all . |2005|) . To define VL 



samples, we restricted apparent magnitudes to the range 
14.5 < rrLr < 17.77, with the bright limit imposed to avoid 
the small incompletenes s associated with galaxy deblend- 
ing ( Zehavi et al. ■ I2OO5I) . In Tabled! we report the limits 
of the five VL samples we have considered: Rmim Rmax 
(in Mpc/h) are the chosen limits for the metric distance; 
Mmim M„iax define the interval for the absolute magni- 
tude in each sample. In Table [3] we report the number of 
galaxies in each of the three angular regions for the five 
VL samples: in the second column there is the case where 
K-corrections have been applied, the third column with- 
out K-|-E-corrections and the fourth column with K-|-E- 
corrections. 



^ see NYU Value-Ad ded 
|http: //ssds .physics .nyu. edu/ I 



Galaxy Catalog, 2008 
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VL Sample 


Kcorr 


None 


E+K 


RlVLl 
R2VL1 
R3VL1 


36316 
5939 
4231 


35372 
5805 
4124 


36693 
5992 
4290 


R1VL2 
R2VL2 
R3VL2 


48745 
9805 
10363 


49981 
10020 
10556 


53086 
10576 
11136 


R1VL3 
R2VL3 
R3VL3 


58980 
11328 
11941 


51039 
9738 
10410 


54389 
10416 
11090 


R1VL4 
R2VL4 
R3VL4 


44503 
8064 
8057 


33051 
5955 
6062 


44276 
8044 
8125 


R1VL5 
R2VL5 
R3VL5 


25216 
4360 
4113 


20685 
3573 
3390 


21707 
3786 
3601 



Table 3. Number of galaxies in each of the VL samples 
(VL1,...,VL5) and in each region (Rl, R2, R3). 



In what follows we make a detailed study to under- 
stand the effects of K and of other redshift dependent 
corrections. The reason these corrections could play a role 
is that they introduce a redshift-dependent behavior of 
secondary quantities (absolute magnitude and distance) 
when they are derived from primary quantities (redshift 
and apparent magnitude). As several statistical quantities 
we discuss in Sect |4] show a distance (or redshift) depen- 
dence, one may ask whether there is an effect from these 
corrections. To constraint the possible effects of these 
corrections, we discuss two different choices of them in 
Sects l3. 4113.51 We refer to Appendix |X] for a discussion of 
the derivation and the role of the cosmological corrections. 

3.4. Effect of K-correction 

To study the effect induced by the K-corrections on the 
correlation analysis discussed in what follows, we con- 
structed a set of VL samples without applying the Kr{z) 
term in EqlB] This choice is clearly not justified from the 
physical point of view and can be interpreted as a way to 
introduce a general linear redshift-dependent correction 
to the absolute magnitude-redshift relation. The limits in 
distance of the corresponding VL samples are the same as 
for the samples with K-corrections, and the limits in abso- 
lute magnitude are all the same, except for VL2 and VL5 
where there is a difference of 0.1 magnitudes while the 
range in absolute magnitudes is the same (see Table |4]). In 
Table [3] we report the number of galaxies for the five VL 
samples: one may note that the main changes occur in the 
deepest samples (i.e. VL4 and VL5) where the number of 
objects decreases by a factor of ^ 20%. 

3.5. Effect of E-correction 

According to sta,ndard models of galaxy formation 
( Kauffmann et al. I. l2003h . because of the evolution of stars. 



VL sample 


Rmir, (Mpc/h) 


Rmax (Mpc/h) 


M ■ 




VLl 


50 


200 


-18.9 


-21.1 


VL2 


100 


300 


-19.8 


-21.9 


VL3 


125 


400 


-20.5 


-22.2 


VL4 


150 


500 


-21.1 


-22.4 


VL5 


200 


600 


-21.5 


-22.7 



Table 4. The same as for Table [2] but for VL samples 
without K-corrections and without E-corrections. 



elliptical and spiral galaxies were more luminous in the 
past. To take this physical change in the galaxy proper- 
ties into account, one should include some corrections to 
the measured luminosity. These corrections are generally 
unknown; i.e., there is not an adequate model of evolution 
to allow for proper calculation of the corrected absolute 
magnitudes. For this reason and because small-scale clus- 
tering at low redshift is thought not to be affected by 
galaxy evolution, these corrections have frequently been 
neglected in the construction of VL samples, e.g., (see 
IZeh avi et al ], [2002). However, this omission is a reason- 
able working hypothesis only if one considers local (condi- 
tional) quantities. Indeed, as we discuss below, when one 
normalizes fluctuations to the sample average, one uses in- 
formation concerning all scales in the sample, so that all 
statistical quantities derived by such a normalization are 
affected by the large-scale properties of the distribution 
inside the given sample. 

As discussed in Appendix[Xl the formula E{z) = 1.6X0 
has been used more recently as a simple fit for the aver- 
age evolution in galaxy luminosities in the recent past (see 
Tegmark et"al] . 120041 : Izehavi et all . l2005h 0. In this situa- 



tion the E-corrected absolute magnitude is 
Mr logio [R{z){l + z)] - Kr{z) - 25 



E{z) .(8) 



The limits in distance for the samples with E+K- 
corrections are the same as in the K-corrected sam- 
ples while the limits in absolute magnitude change (see 
Table [5]) . For this reason a rough comparison of the num- 
ber of objects in each VL sample is not meaningful. In 
Table |3] we report the number of galaxies for the five VL 
samples. 

4. Scale-length analysis 

As discussed in Sect 121 the main stochastic variable that 
we consider and for which we determine statistical proper- 
ties is the conditional number of points in spheres Ifl. That 



^ Note that these authors use E(z 
struct the absolute magnitude Mo.i 



- 1.6{z — 0.1) to con- 
corresponding to the 
SDSS band shifted to match its rest-frame shape at z — 0.1, 
from the apparent magnitude nir and redshift z by using EqlHl 
In this case the Kr{z) term is the K-correction from the r band 
of a galaxy at redshift z to the '^'^r band. Because here we use 
Kr{r) at z = instead of at 2 = 0.1 the evolution correction 
has to be shifted by 0.1 in redshift. 

^ In Appendix 121 we discuss how several properties of galaxy 
fluctuations can be measured in the magnitude-limed sample 
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VL sample 



VLl 
VL2 
VL3 
VL4 
VL5 



R,r,in (Mpc/h) 



50 
100 
125 
150 
200 



Rn 



(Mpc/h) 



200 
300 
400 
500 
600 



Mr, 



-18.8 
-19.7 
-20.4 
-20.9 
-21.4 



M„ 



-21.0th 
-22.0gphg 
-22.2j.g|jjj 



e i"" galaxy will not be included in Af (r) as long as the 



ly fully enclosed spheres in the sample volume, 

th 



then 



Table 5. The same of Tabl2] but for VL samples with 
E-t-K-corrections and without (see text for details). 



is to say, we compute for each scale r the {Ni{r)]i=i,,,i^i 
determinations of the number of points inside a sphere of 
radius r whose center is on the i*'' galaxy. The number of 
centers M, as we discuss in more detail below, depends on 
the sphere radius r, i.e. M = M(r). The random variable 
Ni{r) thus depends on scale r and on the spatial posi- 
tion of the sphere's center. We can express the i^^ sphere 
center coordinates with its radial distance Ri and with its 
angular coordinates cXi = (rji, Xi). Thus, in general, we can 
write 



(9) 



When we integrate over the angular coordinates at for 
fixed radial distance Ri, we find that Ni{r) — N{r;Ri); 
i.e., it depends on two variables the length scale of the 
sphere r and the distance scale of the z*'* sphere cen- 
ter Rj, so it has been ca lled the scale-length analysis 



(jSvlos Labini et all . l2009cD 



4.1. Number of centers as a function of scale 

The reason the number of centers M(r) depend on the 
scale r follows. The sample geometry is a spherical por- 
tion delimited by the minimal and maximal value of the 
radial distance and by the angular coordinates reported 
in Table [TJ For the z*'* galaxy, with coordinates {Ri,ai), 
we compute the six distances from the boundaries of the 
sample and consider the minimal one r^^. By simple geo- 
metrical considerations these distances are 



= Ri cos (A 
= Ri cos (A 
= Ri sin(Ai - 
= Ri sin(AmQ 

Ri Rmin 
Rm.a.x R. 



sin(77j - 7], 

IX -^i) 



lin ) 



(10) 



and 



be 



= mm r: 



li'2i'3i'4;'5i'6j 



The length scale 



be 



corresponds to the radius of the largest sphere, which is 
centered on the position of the galaxy and which is fully 
contained in the sample volume. As in Eq[9] we consider 



by considering galaxy counts as function of apparent magni- 
tude and the redshift distribution. This study has the advan- 
tage of using direct observational quantities. It is interesting 
to note that these studies are compatible with the results pre- 
sented in this section. 



:re radius is r > r^^. In this situation for large sphere 
M{r) decreases and the location of the galaxies con- 
buting to M{r) is mostly placed at radial distances in 
range [Rmin + f, Rmax — ^] from the radial bound- 
aries of the sample at [Rmim Rmax]- 

One could also make the choice considering incomplete 
spheres, i.e. spheres that are only partially contained in 
the sample volume. In this case, one could then weight the 
number of points inside the incomplete sphere by the vol- 
ume of it contained in the sample, thus obtaining a more 
robust statistics, especially on large scales. We avoid this 
for the following reason. Suppose that outside the sample 
there is a large-scale structure (or a deep under-density) : 
the weighting above will underestimate (or overestimate) 
the real number of points inside the full sphere with re- 
spect to the incomplete one. This inevitably introduces 
a bias in the measurements, which affect large-scale de- 
terminations. As it is precisely the scope of our study to 
determine the properties of large spatial fluctuations, we 
avoid using a method that implicitly assumes that these 
are irr elevant 
l2009bl) . 



( Gabrielli et al. , 120051: ISvlos Labini et al 



Given their different sizes, the number of centers as a 
function of scale M(r) is quantitatively different in each 
of the five VL samples. However, one may note from FigU] 
that the behavior of M{r) is similar in the different cases. 
For small sphere radii almost all galaxies are included; i.e., 
M{r) is equal to the number of points contained in the 
sample. Instead, when the sphere radius becomes compa- 
rable to the size of the largest sphere radius which is fully 
contained in the sample volume, M{r) shows a fast de- 
cay. The scale at which this occurs, grows proportionally 
(taking the sample solid angle fixed) to the depth of the 
VL sample. The largest scales explored in this survey, i.e. 
r « 100 Mpc/h, can only be reached with the deepest VL 
samples. 

4.2. Probability distribution of conditional fluctuations 

The main information about the statistical properties 
of the random variable Ni{r) is provided by its PDF, 
P{N, r). This gives the probability distribution to find N 
points in a spherical volume of radius r centered on a dis- 
tribution point. It should be noticed that this is different 
from the PDF of unconditional fluctuations, which pro- 
vides the probability density that in a spherical volume of 
radius r c entered on an arbitrary point of space, there are 
N points (ISaslawll2000l) . Only when unconditional proper- 
ties are well-defined then does PD F of conditional and th e 
unconditional give similar results ( Gabrielli et al. . 20051) . 



The frequency distribution in bins of conditional fluc- 
tuations at fixed scale r gives an estimation of the PDF 
at that scale. The error bars are computed as the square 
root of the number of points in each bin. To compare the 
behavior in different VL samples, which are defined by 
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Fig. 6. The PDF in different samples (with K-corrcctions 
only) and for different sphere radius with the best-fit 



Gaussian function (see captions). Poisson error bars are 
Fig. 4. Number of centers M{r) as a function of scale in reported as a reference, 
the five VL samples (see text for details). 



different luminosity cuts so generally containing galaxies 
of different absolute magnitudes, we define the normalized 
variable 



N,{r)-N{r) 
W) 



(11) 



and we determine its PDF, that is. 



p{x, r)^P (^N{r) = N{r) + a;S(r)) x S(r) , (12) 

where P{N{ r)) = P{N,r) is the PDF of the variable 
Ni{r), N{r) is its estimated whole sample first moment 
and S(r) is the estimated standard deviation on the scale 
r. 

In FiglHwe show the PDF, estimated in the region Rl 
only, of the VL samples with K-corrections, of the samples 
where E-l-K corrections have been applied and finally of 
the samples in which no corrections have been imposed. 
In Figiniwe also show, but only for some cases, the PDF 
with the estimated Poisson error bars, together with the 
best fit obtained by a Gaussian function. The PDF is not 
affected by E and/ or K corrections even in the deepest 
samples as VL4 and VLS. For this reason, and given that 
E-corrections are not well-defined, as discussed above, in 
what follows we mostly focus on the case where only K- 
corrections have been applied. 

It is interesting to compare results for r = 5, 10, 20, 30 
Mpc/h in different K-corrected VL samples (see Figl?]): 
the PDFs collapse fairly into one another The over- 
all shape is characterized by a long (or fat) tail, slowly 
decaying, for x values high, which makes it substantially 
different from a Gaussian function. This is the effect of 
the large structures (i.e. large fluctuations) contained in 
these sa mples. Similar behaviors ha ve been found in the 
2dFGRS (jSvlos Labini et al.l . I2n09al lbh. 



^ The PDF of VLl for r = 30 Mpc/h is not as regular as the 
other cases because of poor statistics. 



Both for small (i.e., r < 30 Mpc/h) and large (i.e., 
r > 30 Mpc/h) the PDF does not even converge to a 
Gaussian function. Actually, for the largest sphere radii 
(i.e. r = 80, 100 Mpc/h), in the sample VL4 and VLS, 
the PDF shows a relatively long tail for low x values fol- 
lowed by a sharp cut-off at values higher than the peak 
of the PDF. We interpret this behavior as due to an in- 
trinsic bias, because given the finiteness of the sample vol- 
ume, only a few structures can be contained in it and thus 
this statistical measurement cannot properly give a reli- 
able estimate of large-scale fluctuations. Already in the 
less distant samples (e.g., VLl, VL2, and VL3), the main 
trends discussed above are clearly present up to sphere 
radii r w 80 Mpc/h. The distant samples (e.g., VL4 and 
VL5), where the effect of other cosmological corrections 
maybe more important, allow us to reach the scales of 
- 100 Mpc/h. 

To summarize the main result: (i) the PDF is not af- 
fected by E and/or K corrections, (ii) For scales on which 
conditional fluctuations are self-averaging and the PDF 
is stable in different sample subregions, i.e. for r < 30 
Mpc/h, the overall shape of the PDF is characterized by a 
long (or fat) tail that makes it substantially different from 
a Gaussian function, (iii) For r > 30 Mpc/h, the PDF does 
not converge to a Gaussian function and it has a different 
shape in different samples. In the next section we present 
specific measurements to study the large-scale properties 
of conditional fluctuations in these samples testing self- 
averaging properties. 

4.3. Test for statistical self-averaging 

To study the origin of the differences in the behavior of 
the PDF in different VL samples, for large enough sphere 
radii, we can consider a specific test. This is useful for 
studying the self-averaging properties of the distribution 
in a given sample. This test allows us to establish whether, 
inside a given sample, it is meaningful to derive, for in- 
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Fig. 5. Conditional PDF on different scales for the 5 VL samples (each row corresponds to a VL sample; the scale 
r is reported in the caption) with K corrections (black), with K+E corrections (red) and without K+E corrections 
(green). 



stance, whole-sample average quantities and whether we 
can consider that a certain estimator gives a reliable and 
stable measurement of the ensemble properties of the dis- 
tribution. 

We divide the sample volume into two nonoverlap- 
ping subvolumes of same size, one near of volume Vn, and 
the other more distant of volume Vf, and we determine 
whether statistical quantities are stable or show system- 
atic differences in these subsamples. In principle the ideal 
test would be to compute the PDF in many different and 



nonoverlapping subvolumes, more than the two we use 
here. The limitation we face in doing this stems from only 
the data available in the SDSS-DR6 and the correspond- 
ing sample volumes. In the future data releases, once the 
regions Rl, R2, and R3 will become contiguous, we will 
be able to consider more subvolumes of a single sample. 
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Fig. 7. Normalized PDF (see Eq fTl]fT2l) for r = 5, 10, 20, 30 Mpc/h in the five VL K-corrected samples. 



Given the two limits of the sample in radial distance, 
Rmin and Rmax, we computed the distance Ru at which 
Vn = Vf, thus obtaining 

Rh ^ i '"""^ ^ J . (13) 

To increase the statistics, for a large enough sphere radius 
r, we have allowed the center of a sphere of radius r to be 
at a distance d from such that r > d. In this situation 
the sphere, whose center is placed, e.g., in the less distant 
subsample, has part of its volume in the more distant sub- 
sample and vice- versa. Thus a certain overlap of the deter- 
minations of iVi(r) is allowed between the two half-regions. 
This method gives a conservative estimate of the actual 
fluctuations between the subsamples. Indeed the overlap- 
ping of different determinations clearly smooths out fluc- 
tuations between the two subsamples: thus any difference 
we find is certainly a genuine feature of the distribution. 

In addition for each VL sample we consider the PDF 
determined by all the values, at fixed r, in all three sky 
regions. The determination of Ni{r) has to be done sep- 
arately, for each VL sample, in the three different sky 
regions Rl, R2, and R3 because of the geometrical con- 
straints discussed above. This allows us to improve the 
statistics, although the Rl region contains about a factor 



10 more galaxies than the other two regions and its larger 
volume allows many more determinations than in the two 
other regions. In particular for a large enough sphere ra- 
dius only the values in the Rl region can be measured. 

Results for K-corrected samples are shown in Fig|Sl 
The peak of the PDF in the two half volumes of the dif- 
ferent VL samples is located approximately at the same N 
value for r < 30 Mpc/h: although in this range of sphere 
radii a difference is sometimes detectable in the location 
of the peak (e.g., in the samples VL4 and VL5), the over- 
all shape of the PDF does not substantially change in the 
two subvolumes; instead for r > 30 Mpc/h, the whole 
PDF shows a systematic shift, because the shape is very 
sensitive to the different kinds of ffuctuations (structures) 
present in each subvolume. In this situation the estimation 
of the first and second moment in the whole sample is af- 
fected by systematic effects that preclude a statistically 
meaningful information from them. 

In all samples but VL2, the PDF is shifted more toward 
lower N values in the nearby part of the sample than in 
the more distant one. This occurs because ffuctuations are 
generally wilder in the more distant part of the sample. 
This is the effect of the sample geometry: larger struc- 
tures can only be found where the geometry of the sample 
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volume allows to them contain and indeed this happens 
toward the far boundaries of the samples. 

The sample VL2, for r > 20 Mpc/h, shows an inter- 
esting and peculiar feature: particularly, the PDF in the 
nearby subvolume is shifted toward higher N values than 
that in the more distant one. In this case, there is a large 
under-dense region for R > 220 Mpc/h extending up to 
the limits of the sample at i? = 300 Mpc/h (see discussion 
below). The trend found in VL2 is interesting, because it 
shows that there is not only the occurrence of large fluc- 
tuations in the more distant part of the sample volume, 
which could be thought to be ascribed to a systematic se- 
lection effect other than structures. It shows instead that 
there is not such a systematic trend in each of the samples. 

This situation clearly agrees with the behavior of the 
whole sample PDF discussed in the previous section, par- 
ticularly that there are, at the same sphere radius r, de- 
tectable changes in shape of the PDF in different VL sam- 
ples. This implies that the sample volumes are not large 
enough to allow stable determination of the PDF and its 
moments for sphere radii r > 30 Mpc/h. 

As a final remark, to reach the important conclusion 
about non self-averaging properties of conditional fluctu- 
ations, when they are filtered on scale r > 30 Mpc/h, it 
is enough to consider the nearby samples VLl, VL2, and 
VL3. In these samples, due to the narrow range of red- 
shifts involved, any other type of cosmological correction 
than the ones considered here, is expected to perturb our 
results a little. 



center is in the interval range [R, R + AR] . Its variance 
can be estimated by 



S2(r;i?, Ai?) 
1 (7V(r;i?„Ai?)-iV(r;i?,))" 



(15) 



[Mb - 1) 



To study the sequence of structures and voids present 
in the samples, we choose a relatively small radial bin, 
i.e. AR = 10 Mpc/h, and consider the sphere radius r — 
10 Mpc/h. It is clear that, as r = AR, there is some 
overlap in the determinations in contiguous bins resulting 
in an artificial smoothing of the signal. This means that 
the fluctuations we detect in this way represent a lower 
limit to the real ones. In Fig llOl we show the behavior of 
EgfTil in bins of thickness AR = 10 Mpc/h for sphere 
radius r = 10 Mpc/h normalized to the whole sample 
average (see EqlTO] below) for the three sets of five VL 
samples (region Rl) with different corrections, as defined 
in Sect 121 In VLl, VL2, and VL3 the signal is completely 
unaffected by corrections while in VL4 and VL5 there is 
a small effect that however, does not change the main 
trends. In addition the insert panels of FiglTOl shows the 
number of centers which contribute to the average in each 
bin. The fact that this grows as a function of the radial 
distance reflects the limitations imposed by the sample 
geometry discussed in Sect l4.1l Below we summarize the 
situation. 



4.4. Effect of K-corrections and evolutionary 
corrections 

As illustrative examples of the situation in the samples 
with E-l-K corrections, and in those where no corrections 
are applied at all, we show in Figl^l the cases of VL3 and 
VL5. In the former one the corrections, because of the 
relatively high redshifts involved, are expected to modify 
the behaviors more. As one can see from the above fig- 
ures, there is no substantial change with respect to the 
case where only K-corrections are applied. Thus even in 
this case, the effect of K-l-E corrections represents minor 
modifications to the measured behaviors. 

4.5. Average in bins 

To determine the features of galaxy structures on different 
scales, we now consider a local average of Ni{r) computed 
in the following way. We divide the whole range of radial 
distances in each VL sample, in bins of thickness AR and 
we compute the average 

-, 3 = l,Mb 

N{r;R,AR) = — ^ iV(r;i?,), (14) 

R.je[R.AR] 

where the sum is extended to the Mb determinations of 
Ni{r) = N{r;Ri) such that the radial distance of the z*'' 



— In the VLl sample there are fluctuations of ~ 40%. 
There is no a well-defined radial-distance trend; in- 
stead the scatter in the measurements corresponds to 
the location of large-scale structures. The behavior is 
insensitive to the effect of the K and/or E corrections 
considered. 

— In the VL2 sample there is a high over-density in the 
radial distance range [180, 220] Mpc/h which is fol- 
lowed by a sharp decay, signaling a relative under- 
density for R > 220 Mpc/h. Even in this case there 
is no detectable impact of K and/or E corrections con- 
sidered. 

— The high over-density up to i? w 200 Mpc/h is also 
visible in the VL3 sample, and is followed by an under- 
density in the range 220 < R < 270 Mpc/h. Beyond 
300 Mpc/h there is another relative over-density ex- 
tending up to the sample boundaries. The effect of 
E-corrections is to relatively amplify the over-density 
at i? « 200 Mpc/h with respect to the under-density 
on larger scales. 

— The behavior in VL4 is similar to the one in VL3. Here 
the sharp fall in the average conditional density in bins 
at ~ 220 Mpc/h is followed by a relatively slow growth, 
which seem to saturate at about ~ 370 Mpc/h at about 
the same level as the fluctuation at ~ 200 Mpc/h. The 
effect of K and/or E corrections is to amplify the dif- 
ference between amplitude of fluctuations at the short 
and long radial distances. 
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Fig. 8. PDF in the two subvolumes of the K-corrected VL samples (each row corresponds to a VL sample): the black 
line marks the PDF in nearby subsample, and the red line in the more distant subsample. The x— axis reports the 
number of points N{r) (the scale r is reported in the caption) and the PDF P(iV;r) is on the y — axis. 



Even in the case of the sample VL5 the average behav- 4.6. Normalization of the behaviors in different VL 
ior is quantitatively but not qualitatively changed by samples 
the effect of K and/or E corrections. In this sample, 
as well as in VL4, there is a coherent trend over the 

whole sample volume, which is a signature of persisting We can now normalize the behaviors of the radial density 
large-scale fluctuations. and of the average conditional density in bins discussed in 

Sect l4.5l in the different VL samples. This is done by com- 
puting the normalizing factors for the different VL samples 
assuming Eq|C.2l and by knowing the galaxy luminosity 
function ( Joyce and Svlos Labini . 200 In this approxi- 
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Fig. 9. As FigE] but for the K+E-corrected VL3 and VL5 samples (left) and for the same samples without K+E 
corrections (right). 



mation the observed radial density in the VLl sample can 
be written as 



n^^i(i?) = n{R) X / (j){L)dL = n{R) x 

Jli 



(16) 



where Li and L2 are respectively the limit at the faint 
and bright absolute luminosities of the sample VLl, and 
we have defined 



(17) 



Clearly, the radial density, for instance, in the sample VL2, 
can be normalized to that of VLl by computing 



(R) 



(18) 



Hereafter, to compute the normalization factors we use 
the best-fit parameters to the luminosity function found 
m AppendixlC0. The normalization factor for VL5 is the 
most uncertain because the measured luminosity function 
deviates from the simple Schechter function fit for bright 
magnitudes. 

Figure [TT] shows the distance behavior of the normal- 
ized radial counts of galaxies in the region Rl. A persistent 
growth of the density for distances R > 300 is found, while 
for smaller radial distances there is the fluctuating behav- 
ior already discussed in the Sect l4.5l This is very similar 
to Fig[TO] (bottom-right panel) where we considered the 
average in bins of the SL analysis, i.e. Eq[T4l as a function 



In Appendix [C] we discuss the determination of the lumi- 
nosity function and of two important, commonly used assump- 
tions, that the space density is constant and that space and 
luminosity distributions are independent. We emphasize that 
the latter can be used also when the density field is inhomo- 
geneous while the former corresponds to the strict assumption 
of spatial homogeneity. 

although these behaviors look very similar, they refer to 
two different measurements which in principle are not expected 
to give the same behavior. 



of the radial distance for r = 10 Mpc/h with the same 
normalization factors as are used for the radial density. 
Indeed the same approximations as are used to derive the 
radial density normalization can be used to normalize the 
average SL data. The normalization factors obtained in 
this way allow us to produce a single behavior from 50 
Mpc/h to 600 Mpc/h. The main features are again the 
over-density at i? « 200 Mpc/h, the relative low under- 
density in the range [220, 300] Mpc/h and the persistent 
growth for R > 300 Mpc/h. 



In this w ay we reach a c ompletely different conclusion 
from that of'Lovedav ( 20041 ). Indeed, from the analysis of 
the luminosity function for galaxies selected in four red- 
shift slices (0.001 < z < 0.1,0.1 < z < 0.15,0.15 < z < 
0.2 and 0.2 < z < 0.3) and despite the uncertainties in 
the shap e of th e luminosity function in the redshift slices, 
iLovedav (2004) concluded that there is clear evolution in 
the amplitude of the luminosity function, in the sense of 
an increasing amplitude (vertical shift) and/or luminos- 
ity (horizontal shift) with redshift. On the other hand, we 
conclude that the behavior of the radial counts of galaxies 
as a function of distance is consistent with the average con- 
ditional number of galaxies in spheres as a function of the 



radial distance i.e. EqUD The behaviors of N{r; R, AR) 
can be normalized simply by using the results obtained 
in the same samples for the luminosity function. Thus 
our conclusion is perfectly consistent with the measure- 
ments of the PDF presented in the previous section, and 
it does not imply that a strong evolution has occurred up 
to 2 = 0.2. Rather, as discussed above for the behavior 
of average conditional density in bins, we can trace the 
various main structures in these samples: namely there 
are large fluctuations at about 200 Mpc/h followed by a 
large under-dense region up to 400 Mpc/h, which is then 
followed by other coherent structures up to the sample 
limits. 
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Fig. 12. Whole-sample average conditional density in the 
different K-corrected VL samples in the regions Rl, nor- 



Fig. 11. Radial counts in bins of thickness AR 

Mpc/h, normalized to the luminosity factors as explained malized as explained in the text 
in the text, for the K-corrected VL samples. 



4. 7. The whole sample average and the variance 

When Eql9]is averaged over the whole sample, it gives an 
estimate of the average conditional density 



n{r) 



1 



47rr3 M{r) 



M{r) 



(19) 



In Figll^ we show the whole-sample average conditional 
density in the different K-corrected VL samples, normal- 
ized by using EglTSl Contrary to the behavior of the radial 
number counts and of the SL statistics averaged in bins 
(Eq fT4|) . in this case the behavior of the average condi- 
tional density in different samples do not overlap in a sat- 
isfactory way. This is due to the fact that the whole-sample 
average is biased by the lack of self-averaging properties 
and it does not give a reliable estimation of the ensemble 
quantity. Regardless of its amplitude the quantity N{r) 
shows a power law behavior with exponent 13 = 2.2 ± 0.1 
up to ~ 30 Mpc/h. On larger scales, its determination is 
strongly affected by the non-self averaging properties of 
conditional fluctuations discussed above. To reliably de- 
tect uniformity, the conditional density has to be flat for 
a wide enough range of scales, while in the data we mea- 
sure a different scale-dependence for r > 20 Mpc/h than 
on small scale, but we cannot detect a clear flattening. 
However, in view of the large fluctuations detected by the 
complete PDF analysis and by the self- averaging test, we 
conclude that there is no crossover to uniformity up to ^ 
100 Mpc/h. We need to consider larger samples to prop- 
erly constrain correlations properties for scales greater 
than r > 30 Mpc/h. 

As mentioned above, for r < 30 Mpc/h in all samples 
the PDF is stable with respect to the K-|-E-corrections, so 
there are no detectable differences in the three sets of VL 
samples with different corrections. Thus, while for scales 



r < 30 Mpc/h the data show an approximated power law 
behavior for r > 30 Mpc/h, we are not able to make a 
reliable conclusion, in these samples, about the behavior 
of this quantity, because conditional fluctuations do not 
exhibit self-averaging properties when filtered on scales 
r > 30 Mpc/h. The large-scale inhomogeneity shown by 
the non self-averaging conditional fiuctuations is compati- 
ble with a continuation of power-law correlations, i.e. scal- 
ing properties, to scales larger than 30 Mpc/h. 

4.8. The standard two-point correlation function 

When determining the standard two-point correlation 
function, we implicitly make two assumptions that, in- 
side a given sample, (i) the distribution is self-averaging 
and (ii) it is uniform. The first assumption is used when 
computing whole-sample average quantities. For instance 
it is assumed when the whole-sample average conditional 
density is measured, as discussed in the previous section. 
However, in that case, there is need to assume that the 
estimation of the sample average gives a fairly good esti- 
mation of the ensemble average density. This corresponds 
to the assumption (ii) above. When one of these assump- 
tions, or both, is not verified then interpretation of the 
results given by the determinations of the standard two- 
point correlation function must be reconsidered with great 
care as we discuss in what follows. 

To measure the two-point correlation function, 
the most commonly used estimators are based 

on pair-counting algorithms^ as the Davis and 

Peebles iD avis and PecbleJ (Il983[ ) (DP) and the 
Landav and Szalavl (|l993( ) (LS) estimators. These 
are relatively easy to implement practically by generating 
random distributions in artificial samples with the same 
geometry as real ones. In general, it is not straightforward 
to interpret the results obtained with these estimators 
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at large enough scales; i.e., from around the scale r^" 
at which the spherical shell of radius r^"*, centered on 
a typical distribution point, is only partial l y con tained 
in the sample volume (see iGabrielli et al. . 2005L 405). 



The scale r^^* is the one up to which one can calculate 
the so-called full-shell (FS) estimator, i. e. in which onl y 
complete spherical shells are considered (jKerscher , I1999I) . 

The FS estimator considers, similarly to the case of 
the conditional density estimator, a pair of points at dis- 
tance r only if a sphere of radius r, centered on one of the 
points, is fully contained in the sample volume. Thus this 
method, because it requires fewer assumptions, is the one 
we conside r in more detail here. The FS estimator can be 
written as ( Gabrielli et al. . 2005f l 



N{r, Ar) 1 
V(r, Ar) Us 



(20) 



The first ratio in the r.h.s. of Eql^Ol is the average con- 
ditional density, i.e., the number of galaxies in shells of 
thickness Ar averaged over the whole-sample, divided by 
the volume V{r, Ar) of the shell. The second ratio in the 
r.h.s. of Eq[2Q] is the density estimated in a sample con- 
taining N galaxies, with volume V. Thus, the FS estima- 
tor requires determination of the distances of all points 
to the boundaries as for the case of the conditional den- 
sity (see EqEland EqfT^. However, it should be stressed 
that, when measuring this function, we implicitly assume 
in a given sample, that (i) fluctuations are self-averaging 
in different subvolumes and (ii) the linear dimension of 
the sample volume is V^^^ ^ Ag (GiArieUi ct al., 2005), 
i.e., the distribution has reached homogeneity inside the 
sample volume. If one of them, or both, is not verified in 
the actual data, then the amplitude and shape of the es- 
timated ^(r) will strongly depend on the sample volume. 
This finite-size dependence can be investigated by making 
specific tests as we discuss in what follows. We stress that 
the most efficient way to test the above assumptions is 
represented by the determination of the conditional fluc- 
tuations presented in the previous sections. 

To show how non self-averaging fluctuations inside a 
given sample bias the ^(r) analysis, we consider the esti- 
mator 



ar;R,AR) + l = 



N{r, Ar) 



V{r* 



V{r,Ar) N{r*;R,AR) 



(21) 



where the second ratio on the r.h.s. is now the density of 
points in spheres of radius r* averaged over the galaxies 
lying in a shell of thickness AR around the radial dis- 
tance R. If the distribution is homogeneous, i.e., r* > Aq, 
and statistically stationary, Eq[2T] should be statistically 
independent on the range of radial distances {R, AR) con- 
sidered. 

Indeed the two-point correlation function is defined as 
a ratio between the local conditional density and the sam- 
ple average density: if both vary in the same way when 
the radial distance is changed, then its amplitude remains 
nearly constant. This does not imply, however, that the 
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Fig. 13. Standard two-point correlation function in the 
VL3 sample estimated by Eql2T] the sample average den- 
sity is computed in spheres of radius r* — 60 Mpc/h and 
considering all center-points lying in a bin of thickness 
AR = 50 Mpc/h centered at different radial distance R: 
Ri = 250 Mpc/h (nl) and R2 = 350 Mpc/h (n^). The 
case in which we have used the estimation of the sample 
average N/V (n^) is also shown and it agrees with the FS 
estimator. This former agrees with the measurements pro- 
vided by the LS and DP estimators which give essentially 
the same result. (For sake of clarity error bars are shown 
for the FS, DP and LS estimators, and they are relatively 
small except at small and large r). 



amplitude of f (r) is meaningful as the density estimated 
in subvolumes of size r* can show large fluctuations, and 
this occurs with a radial-distance dependence. To show 
that the ^(r) analysis gives a meaningful estimate of the 
amplitude of fluctuations, one has to test that this ampli- 
tude remains stable by changing the relative position of the 
subvolumes of size r* used to estimate the local conditional 
density and the sample average density. This is achieved 
by using the estimator in Eql^TJ On the other hand, stan- 
dard estimators are unable to test for such an effect, as the 
main contributions for both the local conditional density 
and the sample average density come from the same part 
of the sample (typically the more distant part where the 
volume is larger). 

For instance we consider, in the VL3 sample, AR = 50 
Mpc/h and R = 250 Mpc/h or i? = 350 Mpc/h, with 
r* = 60 Mpc/h. We thus find large variations in the am- 
plitude of ^(r) (see Figs lT3lfT4t . This is simply an artifact 
generated by the large density fluctuations on scales close 
to the sample sizes. The results tha t the estimator Eg i20l 



or others based on pair countiii g (jGabrielli et al.l . 12005 



Svlos Labini and Vasilvev . 20081 ). ha s nearly the 



same 



amplitude in different samples, e.g., ( 


Davis and Peebles , 


1983; Park et al. 


. 1994: Benoist et al.. 


1996: Zchavi et al.. 


2OO2L 12005: Norbers: et all. l200ll 2002 


), despite the large 



fluctuations of iVi(r; i?), are simply explained by the fact 
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Fig. 14. The same as in FigHSlbut now for the VL5 sam- 
ple. In this case the sample average density is computed 
in spheres of radius r* — 80 Mpc/h and considering all 
center-points lying in a bin of thickness AR = 80 Mpc/h 
centered at different radial distance R: Ri — 320 Mpc/h 
{n]) and i?2 = 450 Mpc/h (n^). 

that ^(r) is a ratio between the local conditional density 
and the sample average density. Both vary in the same way 
when the radial distance is changed, so the amplitude is 
nearly constant. 

To understand how large-scale fluctuations can be hid- 
den in the analysis performed by the two-point correlation 
function we consider the following simple example. Let us 
suppose the catalog consists of two disconnected volumes: 
for simplicity we fix them to be spherical with radii Rl 
and R^ respectively. Let us suppose that the average con- 
ditional density (supposed to be self-averaging at all scales 
considered) is power-law, i.e. 



(r) 



N{r,Ar) _ (3 - 7)^ 
V{r, Ar) ~ 



1,2 



An 



(22) 



where is the amplitude in the volume and ^ B^ 
in the volume . The estimation of the sample density is 



1,2 



V 4n{Rl-^) 



N 



V(Rs) 



.(23) 



It is clear that if B^ ^ B^ there will be large fluctuations 
between the two volumes on scales close to the sample 
sizes. However if Rl ~ R^ — Rg, from Eql^Olwc find that 
the estimator of the two-point correlation function is 



a,2(r) = 



1.2 



1 



3-7 
3 



R 



1,2 



1 



(24) 



This no longer depend on the different amplitudes of the 
conditional density. That is, despite the difference in the 
conditional density and in the whole-sample density in the 
two volumes (which depends on the ratio between B^ and 
B^), the amplitude of the two-point correlation function 
does not reflect these (arbitrarily large) variations. 



Similarly in the case Rl 7^ the difference in ampli- 
tude between the estimation of the two-point correlation 
function in the two volumes is simply 



6(0 + 1 

6(r) + i 



Rl 



(25) 



thus resulting in a relatively small factor, when Rl w 
0, even though the difference between B^ and B^ can be 
arbitrarily large! That is, even though the average density 
can fluctuate by an arbitrarily large factor, the amplitude 
of f (r) may not show a similar variation. This does not 
imply, however, that the amplitude measures an intrinsic 
property of the distribution. Actually, in Eql25] the dif- 
ference in the amplitude is related to the sample sizes. 
This means that the only unambiguous way to establish 
whether the average density is a well-defined quantity, 
hence whether the results obtained by the standard cor- 
relation function analysis are meaningful, is represented 
by the study of conditional fluctuations presented in the 
previous sections. 

By using different normalizations, which however are 
all in principle equally valid if the distribution has a well- 
defined average density inside the sample, we have shown 
that the amplitude of the estimated correlation function 
varies in the SDSS samples. This occurs because both the 
assumptions on which the determination of the standard 
to point correlation function is based, are not verified in 
these samples, and Aq is certainly larger than the samples 
size. 

Finally we note that, not only the amplitude, but also 
the shape of the correlation function is affected by the nor- 
malization to a sample average, which largely differs from 
the ensemble average one. The shape however is strongly 
biased only on large separations when ^(r) ^ 1, i.e. when 
the first term in the r.h.s. of Eq[24] becomes compara- 
ble to the second one. W e refer the interested reader to 
ISvlos Labini et al" ( 2009dh for a more detailed discussion 
of the determination of the standard estimators (i.e., the 
LS and DP estimators) of the two-point correlation func- 
tion in these samples. 

4.9. The SDSS Great Wall and other structures 

As mentioned above, the measurements of the M(r) values 
of Ni (r) on the scale r allow derivation of many interest- 
ing properties about structures in these samples. Beyond 
the statistical properties already described, it is interest- 
ing for example to consider the density profile derived 
from N{r; Ri). An example is shown in Fig ll6l which dis- 
plays the behavior of N{r;Ri) in the sample VL2 (with 
K-corrections) and in the three different regions for r = 10 
Mpc/h. This analysis is more powerful than the simple 
counting as a function of radial distance, in tracing large- 
scale galaxy structures. Indeed, one may precisely describe 



For a sample of arbitrary geometry Rs is defined to the ra- 
dius of the largest sph ere fully contained in the sample volume 
(|Gabrielh et al.l . |2005| ). 
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Fig. 16. Behavior of N{r;Ri) in the K-corrected VL2 
sample and in the three different regions for r = 10 Mpc/h 
(Rl top, R2 Middle and R3 bottom). 



Fig. 17. Projection on the X - Z plane of R3VL2. The 
SDSS Great Wall is the filament in the middle of the sam- 
ple. 



the sequence of structures and voids characterizing the 
samples and, by changing the sphere radius r, one may 
determine the situation at different spatial resolutions. 
For instance, the distribution in the angular region R3 
(see the bottom panel of FigiT6|) is dominated by a single 
large- scale structure, w hich is known as the SDSS Great 
Wall (|Gott el al.l . l2005l) . In the R2 and R3 regions, one is 
also able to isolate structures well at different distances, 
while the Rl region, which covers a solid angle about six 
times larger than the other two sky areas, the signal is 
determined by the superposition of different structures of 
different amplitude and on different scales. In the latter 
case, it would be useful to divide the sample into smaller 
angular slices. 

In Fig[T7] we show the projection on the X — Z plane of 
R3VL2 where the SDSS Great Wall is placed in the middle 
of the sample, and it is clearly visible as a coherent struc- 
ture of large amplitude , similar to a mountain chain, ex- 
tending over the whole sample. The information contained 
in the N{r;Ri) data allow quantitative determination of 
the properties of this structure in an unambiguous way, as 
we discussed above. For instance by a simple visual com- 
parison of the profile in the different angular region we can 
conclude that, although the Great Wall is a particularly 
long filament of galaxies, it represents a typical persistent 
fluctuation in the samples' volume. 

In addition it is interesting to consider the full Ni (r) — 
N{r; Xi, Hi, zt) data, where {xi,yi, Zi) are the Cartesian co- 
ordinates of the i*^ center. To this aim we chose a three- 
dimensional representation where on the bottom plane we 
use the a;, z Cartesian coordinates of the sphere center and 
on the vertical axis we display the intensity of the struc- 
tures, i.e., the conditional number of galaxies contained in 
the sphere of radius r. (In the y direction the thickness of 
the sample is small, i.e.. Ay « 15 Mpc/h.) This is shown 



in FigdHfor r = 10 Mpc/h. One may note that that the 
SDSS Great Wall is clearly visible as a coherent struc- 
ture similar to a mountain chain, extending all over the 
sample. It is worth noticing that profiles similar to those 
shown in Figs f l5 p6l have a lso been found in the 2dFGRS 



(jSvlos Labini et al.l . I2009al |bl) supporting that the fluctua- 
tions we have identified in this catalog are typical of galaxy 
distribution. 



4.10. Role of spatial correlations 

To show that the large-scale fluctuations in the galaxy 
density field we have detected are genuinely due to long- 
range spatial correlations and not to some selection effects, 
we performed the following test. In a given VL sample 
we have assigned a redshift randomly to each galaxy ex- 
tracted from the list of redshifts of the galaxies in the same 
sample [ll. In this way the angular coordinates of each ob- 
ject are fixed, its redshift is randomized while the redshift 
distribution in the sample is taken fixed. This operation 
washes out the intrinsic spatial correlations of the galaxy 
distribution, but conserves the main observational coordi- 
nates (i.e.. angular positions and redshift). Thus the result 
of this test may tell us whether fluctuations and structures 
are an effect of spatial correlations. The results is that the 
signal in Ni (r, R) is substantially washed out as one may 
noticed by comparing Figs fTSlfTBl 

However, it should be stressed that, if there are struc- 
tures of spatial extension comparable to the sample size, 
these will not be completely washed out by the randomiza- 
tion adopted given that the redshift distribution is taken 
to be fixed. Indeed this is the case for the SDSS Great 
Wall, contained in the sample R3VL2. In Fig[T8]the struc- 
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Fig. 18. As in FigHHbut for the randomized VL2 samples 
as described in the text. 
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Fig. 19. Upper panels: the PDF of conditional fluctuations 
in spheres of radius r = 5 Mpc/h (left) and r = 10 Mpc/h 
(right) for the real sample (Real) and the randomized one 
(RND) as explained in the text. Bottom panel: conditional 
density as a function of scale. 



ture is almost completely washed out, but as it is as large 
as the sample, there is a residual in the randomized ver- 
sion. By means of the the statistical analysis shown in 
FigfTOl we find that the PDF of conditional fluctuations 
becomes very peaked in the randomized sample; i.e., it 
tends to a Gaussian function, while in the real sample 
it displays a long tail for high N values, corresponding 
as discussed above to the large fluctuations present in this 
sample. In addition the conditional density (i.e., the condi- 
tional average number of points in spheres given by EqfTOl 
divided by the spherical volume of radius r) becomes flat 
for the randomized sample, signaling the absence of cor- 
relations, while it was a power law in the real sample with 
an exponent approximately equal to 0.8 ±0.1. Because the 
redshift distribution is taken as flxed in the randomized 
sample, its PDF will converge to the one of the real sample 
by considering larger sphere radii. 

5. Comparison with theoretical models 

Let us now discuss the problem of comparing the statisti- 
cal analysis of real galaxy samples with theoretical predic- 
tions. In this respect it is useful to remember that theoret- 
ical models predict that, by gravitationally evolving a den- 
sity fleld compatible with cosmic microwave background 
anisotropics (CMBR) observations, there is a maximum 
scale up to which nonlinear structures have formed at the 
present time. The precise value of such a scale depends on 
the details of the initial correlations in the density field 
and on the values of the cosmological parameters, and 
this is roughly placed at about Aq ~ 10 M p c/h in the 
ACDM concordance model (see ISpringel et al.l . l2005[ ). On 
scales of r > Ao the average density becomes well-defined 
as long its fluctuations become small enough. As discussed 
in Sect. 2, this scale may be defined as the one at which 
the variance of the fluctuations is twice the square of the 



asymptotic (large-scale) average density. Then for scales 
r > Aq, the situation is simple: gravitational clustering has 
linearly amplifled initial fluctuations and thus correlation 
properties reflect those at the initial time. The linear am- 
pliflcation factor can be easily computed by making a per- 
turbation analysis of the self-gravitating fluid equations 
(i.e., Vlasov-Poisson equations) in an expanding universe. 

There is no full analytical understanding of the prop- 
erties of self-gravitating particles in the nonlinear phase 
occurring for scales r < Aq; for this reason, generally gravi- 
tational N-body simulations represent the means to study 
these structures. A gravitational N-body simulation fol- 
lows the motion of particles (supposed in cosmology to 
be dark matter particles) moving under the effect of their 
self-gravity in an expanding universe. By normalizing the 
initial amplitude of fluctuations and the density correla- 
tions to the observations of the CMBR anisotropics, one 
flnds that there is a well-deflned time scale that allows 
one to deflne the present time at which the simulation 
is stopped. In this context it is worth remembering that, 
in the CDM-like models gravitational clustering builds up 
non-linear structures in a bottom-up way because of the 
small initial velocity dispersion. 

An N-body simulation provides the distribution of 
dark matter particles and not that of galaxies. It is here 
that the problem of sampling, or biasing, is relevant. 
Indeed galaxies are supposed to form on the highest peaks 
of the dark matter density field, so one has to deflne the 
rules to make a correlated sampling of the dark matter 
particles to identify "mock" galaxies. There are different 
sampling procedures in the literature and they are the out- 
come of the so-called semi-analytic models of galaxy for- 
mation. Generally these sampling procedures only mod- 
ify correlations at small scales, i.e., they are local sam- 
pling. Only nonlocal sampling procedures may give rise to 
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different correlation properties on large scales. However, 
no form of currently known galaxy bias can produce the 
large-scale fluctuations we observe in the catalog. Indeed 
the c urrent accepted theoretical model of biasing (jKaiseii . 
19841 ) predicts that, when clustering is in the linear phase. 



threshold sampling the highest peaks in a Gaussian den- 
sity field gives rise to a simple linear amplification of fluc- 
tuations and of their correlations. This situation is ex- 
pected to hold for scales r > 10 Mpc/h where density fluc- 
tuations in N-body simulations of the dark matter field are 
in the linear regime, the PDF of fluctuations is Gaussian, 
and thus biasing is linear. In these conditions, there is a 
simple relation between mock galaxies and dark matter 
particle correlation properties. Thus complications with 
respect to this simple picture are expected only on small 
scales. 

We use a semi-analytic galaxy catalog constructed 
from the M ill enniu m ACDM N-body simulation 
(jSnringel et all . 120051 ). To construct mock samples 



corresponding to SDSS VL samples, we used full version 
of the catalog in the ugriz filter system. The catalog 
contains about 9 million galaxies in a 500 Mpc/h cube Pi. 
We used the absolute magnitudes in r filter used in the 
SDSS case, to construct the mock samples with the same 
limits in absolute magnitude as for the SDSS VL samples 
with K-corrections. In Table [5] we report the properties 
of the mock samples: Rmin, Rmax (in Mpc/h) are the 
chosen limits for the metric distance; Mmin, Mmax define 
the interval for the absolute magnitude in each sample, 
C(max,ctmin (in dcgrces) the limits in right ascension, 
Smax, Smin (in dcgrecs) the limits in declination, Nz the 
number of objects in the sample in redshift-space and Nr 
the same for the sample in real-space In addition, we 
construct only the mock samples corresponding to VLl, 
VL3 and VL5. The volume of the samples is constrained 
to be the same as, or similar to, the volumes of the real 
SDSS samples. In particular, the sample region can be 
easily fitted in the simulation cube in case of VLl. For the 
VL3 it should be slightly reduced in declination, while is 
reduced significantly for the VL5 the range in declination 
(see Table El). 

The PDF of conditional fluctuations (see Fig EO)) show 
a clear departure from a Gaussian function for r < 10 
Mpc/h, while it rapidly approaches the Gaussian func- 
tion for r > 20 Mpc/h. For r > 5 Mpc/h there is no 
detectable difference between the real and redshift-space 
cases. Additionally, for r < 10 Mpc/h, the PDF exhibits 
a large N tail that is the signature of the correlations 
present at those scales. 

In addition the analysis of the PDF into two discon- 
nected subvolumes of the mock samples does not show 



^* seelhttp : //www . mpa-gair ching . mpg . de/ galf orm/agnpaper/ ' 

for semi-analytic galaxy data files and description, and see 
fhttp : //www. mpa-gar ching. mpg. de/millennium/ for informa- 
tion on Millemiium LCDM N-body simulation. 

The difference between real and redshift-space is due to 
peculiar velocities. 
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Fig. 21. The self-averaging test for the mock catalogs. It 
is analysis of the PDF into two disconnected subvolumes 
(si and S2) of the mock sample R1VL3 in redshift-space. 



any trend forward non self- averaging and they coincide 
in a statistical sense (see Fia \21\) . That is, in contrast to 
the case of real galaxy samples, the simulations are self- 
averaging. 

Figure [21] shows the behavior of the conditional av- 
erage density in the mock samples. The main difference 
between real and redshift-space occurs at scales r < 5 
Mpc/h, where the redshift-space exponent is systemati- 
cally smaller than the real space one. For the case of real 
galaxy samples we cannot make the real-space analysis 
because galaxy peculiar velocities are not known. We no- 
ticed in that the same finite-size effects that perturb the 
redshift-space reduced two-point correlation function may 
affect the projected one, and thus the whole method to 
infer the r eal-space correlation f unction from the redshift- 
space one (|Vasilvev et al.l . 120061 ). In addition, both in real 
and redshift-space, the exponent is smaller when the av- 
erage galaxy luminosity increases, a trend that is not 
as well-defined in the real data as shown by FiglHl In 
the mock catalogs, the power-law behavior extends up 
to ^ 20 Mpc/h, beyond which there is a well-defined 
crossover which corresponds to the scale where PDF of 
conditional fluctuations approaches the Gaussian func- 
tion. Thus, while the exponent of the conditional density 
is closer in redshift-space, up to r ~ 10 Mpc/h, to what is 
observed in the real galaxy data, the mock samples show a 
clear difference for r > 20 Mpc/h, in that the crossover to 
homogeneity is well-defined and the distribution does not 
present large-scale fluctuations similar to those character- 
izing the SDSS galaxy distribution. In addition, redshift 
and real-space properties are indistinguishable for r > 10 
Mpc/h. 



As a final remark, we note that lEinasto et al.l (|2006a^ 
find that the fraction of very luminous (massive) su- 
perclusters in real samples extracted from 2dFGRS and 
from the SDSS (Data Release 4) is more than ten times 
greater than in simulated samples constructed from the 
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Sample 


Rmin (Mpc/h) 


Rma^ (Mpc/h) 




Oimax 




^max 


Nr 




VLl 


50 


200 


24.0 


66.0 


-48.0 


32.5 


53423 


54555 


VL3 


125 


400 


24.0 


66.0 


-45.0 


30.0 


74645 


74170 


VL5 


200 


600 


24.0 


66.0 


-24.5 


24.5 


15572 


15571 



Table 6. Main properties of the obtained mock VL samples. 




r(Mpi/li) 



I- {Mpc/h) 



Fig. 22. Conditional density in the mock RlVLl, R1VL3, 
and R1VL5 sample in real (e.g., VLlr, VL3r, VL5r) and 
redshift-space (e.g., VLlz, VL3z, VL5z). In the panel on 
the bottom left there is a comparison of the behaviors in 
the different samples in redshift-space, where it is evident 
that the exponent becomes steeper for brighter objects. 
The normalization is taken to have the same large-scale 
density. 



Mill ennium simulations — see also fe inasto et aL ( 2006b . 
I2OO81) . Our results are compatible with these findings. 

6. Conclusion 

We have studied the statistical properties of galaxy distri- 
bution from the SDSS-DR6 sample. This is brief summary 
of our results: 

— The probability density function (PDF) of spatial con- 
ditional fluctuations, in volume limited samples and 
filtered on spatial scales r < 30 Mpc/h, shows a long 
tail, which is the signature of the large-scale structures 
in these samples. 

— The PDF of conditional fluctuations does not converge 
to a Gaussian function even for large-sphere radii (i.e., 
r > 30 Mpc/h). 

— The PDF of conditional fluctuations is unaffected by 
K and (standard) evolutionary corrections. 

— The PDF of conditional fluctuations, filtered on spa- 
tial scales r > 30 Mpc/h, does not show self- 
averaging properties when this is computed in two non- 
overlapping samples of equal volume. 



— The whole-sample averaged conditional density shows 
scaling properties up to ~ 30 Mpc/h, the largest scales 
where this statistics shows self-averaging properties 
(i.e, where the PDF is statistically stable inside the 
sample) . 

— The normalization of the luminosity-redshift function 
and of the two-point correlation function are affected 
by systematic effects, thus do not provide meaningful 
information. 

— The PDF of conditional fluctuations in mock galaxy 
catalogs rapidly converges to a Gaussian function for 
r > 10 Mpc/h so that structures predicted by theoret- 
ical models are at odds with observations. 

— The above behaviors are compatible with galaxy 
counts as a function of apparent magnitude and of 
redshift in the magnitude-limited sample. Indeed these 
show fluctuations on large spatial scales of ~ 10-^20%, 
which are persistent in the sample volume. 

In summary, the primary conclusion of this work is 
that in the SDSS-DR6 we find large-scale galaxy struc- 
tures that correspond to density fiuctuations of large am- 
plitude and large spatial extension, whose size is only lim- 
ited by the sample size. Because of these large fluctuations 
in the galaxy density fleld, self-averaging properties are 
well-defined only on scales r < 30 Mpc/h: in this range 
we find scaling properties of the conditional density that 
decays as a power law as a function of scale with an expo- 
nent around minus one (see Fig ll2l) . Correspondingly, the 
PDF of conditional fluctuations P(r; N) presents a stable 
shape for scales r < 30 Mpc/h, characterized by a long 
tail for high TV values: this "fat tail" is the signature of 
the structures present in these samples. Instead, for r > 30 
Mpc/h, the PDF converges neither to a Gaussian function 
nor to a well-defined shape showing clear evidence of non 
self-averaging properties of conditional fluctuations. 

We interpret this as caused by a systematic effect 
in that sample volumes are not large enough for condi- 
tional fluctuations, flltered on such large-scales, to be self- 
averaging; i.e., to contain enough structures and voids 
of large size to allow reliable determination of average 
(conditional) quantities. This result implies, for instance, 
that the average behaviors of both magnitude and redshift 
distributions in the magnitude-limited sample are biased 
by large spatial fluctuations and thus that their variance 
only represent a lower limit to the real intrinsic variance. 
Furthermore we dis cussed that K and standard evolu- 
tionary corrections (jBlanton et al. ■ I2OO3I) do not qualita- 
tively affect these behaviors. We pointed out the prob- 
lems related to the estimation of amplitude of fluctua- 
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tions and correlation properties from statistical quantities 
which employ the normalization to the estimation of the 
sample average. As long as a distribution inside the given 
sample is not self-averaging, hence not homogeneous, the 
estimation of the two-point correlation function is neces- 
sarily biased by strong finite-size effects. Our results are 
compatible with a continuation of the power-law behavior 
of the conditional density on scales larger than 30 Mpc/h 
and incompatible with homogeneity on scales smaller than 
~ 100 Mpc/h. Only the availability of larger samples will 
allow average correlation properties determined on scales 
larger than ^ 30 Mpc/h. 

Our results, because the imply that galaxy distribu- 
tion is inhomogeneous on scales of ^ 100 Mpc/h, are per- 
fectly compatible with a "Copernican" principle. Indeed 
any statistically stationary inhomogeneous point distribu- 
tion is compatible with the principle that there is no spe- 
cial point or direction in the universe. If for instance there 
were a "local hole" or a particular large structure around 
us, this would not imply that the "Copernican" principle 
is violated, bu t simply that the d istribution is spatially in- 
homogeneous (jjovce et al.l . l2000[) . These are two different 
properties which are sometimes confused in the literature 
(lEllid . 120081 : Iciifon et al.Ll2008h . 

Finally we found that fluctuations in mock galaxy cat- 
alogs are Gaussian for r > 20 Mpc/h, implying that our 
results are at odds with the predictions of the concordance 
ACDM model of galaxy formation. This result remains the 
same when considering redshift space fluctuations (as for 
the real data) or real space ones. Indeed we find that the 
main difference going from real to redshift-space occurs 
for scales smaller than ~ 5 Mpc/h, where the exponent of 
the conditional density passes from —1.8 to about —1. 

Our results are compatible with a series of anal- 
yses of g alaxy number counts in di f ferent catalogs. 



e.g.. APM (IShanksl. Il99fll : iMaddox et al].ll990h. 2d FGR,S 
( Busswefl et al.L l2003h . 2MASS (IPrith et all l2003h. and 



a sample of galaxies in the H band (iF rith et al.l . [2006). 
In all those surveys count fluctuations not normalized to 
the sample average have been considered, and it was con- 
cluded that there are local fluctuations of ~ 30% extend- 
ing over scales of ^ 200 Mpc/h, which are at odds with 
ACDM predictions. Furthermore our r esults are compati- 
ble with the results bv .Lovedavl (|2004[ ) on the SDSS-DRl 
sample, although their interpretation is different. 

Similar persistent spatial fluctuations in the galaxy 
density field have been found in the 2dFGRS by 
Sylos Labini et al.l (|2009al lbh: this shows that these fluc- 
tuations are quite typical of the galaxy distribution. 

In addition the comparison with the model predic- 
tions with real galaxy data, through the analysis of mock 
ga laxy catalogs, which we have discussed, agrees with that 
of Einasto et al.l (2006a,b, 2008) who found, for instance, 
that super-clusters in real samples extracted from the 
2dFGRS and from the SDSS are more than ten times 
larger than in simulated samples constructed from the 
Mi flennium simulations. A sim ilar conclusion was reached 
bv lSvlos Labini et all (|2009al lbll on the 2dFGRS. 
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Appendix A: Cosmological corrections 

In this appendix we discuss the problem of cosmologi- 
cal corrections to be applied to the data in some detail. 
For each galaxy it is observed, among other quantities, 
the angular coordinates, the redshift z, and the appar- 
ent magnitude m^. From these data we aim to construct 
three-dimensional samples that are not affected by obser- 
vational selection effects. It is observationally established 
that the galaxy redshift is li nearly proporti onal to its dis- 
tance, i.e., the Hubble law (jHubbld . Il929l ) R = c/Hqz, 
where c is the speed of light and Hq is the Hubble con- 



stanlo. In the framework of the Friedmann solutions of 
Einstein field e quations, the linearity of the Hubble law 
(jPeeblesl . [l98oh IS verified only for very low redshifts. In 
general, the (metric) distance R depends on the values of 



In what follows we denote the Hubble constant as Ho = 
lOO/i km/sec/Mpc where ^ is a par ameter in the range 0. 5 < 
h < 0.75 according to observations l|Freedman et al.l . 1200 ll ) 
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cosmological parameters such as the mass density f2,„ and 
the cosmological constant {Ia, so that R — R{z; 
(see lHoggillQQgh . These formulas introduce second-order 
corrections to the linear law that are generally unimpor- 
tant at low redshifts, e.g., z < 0.2, such as the ones we 
consider in what follows. 

In order to reconstruct the absolute magnitude from 
the apparent one we needed to determine the so-called K- 
correction. This correction must be applied because galax- 
ies observed at different redshifts are sampled, by any par- 
ticular instrument, at different rest-frame frequencies. The 
transformations between observed and rest-frame broad- 
band photome tric measuremen t s involve terms known a s 
K-corrections ( Humason et al. . 19561 : Hogg et al. . 2002 ). 
In general, if the galaxy spectrum is known, we can cal- 
culate pr ecisely what the K -correction is. While in the 
past (see Jovce et al.l . Il999b[ ) these were known in an av- 
erage way, in the case of the SDSS it is possible to recon- 
struct the K-correction for each object from the measure- 
ments of galaxy magnitudes in different frequency bands 
(|Blanton et all . [20051. 

Another correction that has to be considered to de- 
termine the absolute magnitude from the apparent one 
is related to the way galaxies ha v e evo l ved from high 
to lo w redshifts ( Kauffmann et all l2003t Blanton et al 



Because there is no well-defined way to describe E- 
corrections, we use the same type of av erage functional 



Tegmark et al 



2003). We expect that this is a relatively minor problem in 
our studies because the maximum redshift we consider is 
2 = 0.2 and evolutionary corrections (or E-corrections) are 
generally believed to be small and linearly proportional to 
the redshift. Indeed, as we discuss in what follows these 
corrections may only play a role for very bright objects 
which can be observed far away from us. There is no well- 
accepted model for galaxy evolution and in what follows 
we adopt th e corrections that are usually used in the lit - 
erature (see iBlanton et al.l . l2003t iTegmark et al.l 120041 ). 



Being applied in an average way, these corrections have 
the disadvantage of not taking the galaxy type into ac- 
count: spiral, elliptical and irregular galaxies should have 
in principle differe nt star-formation histories a nd thus dif- 
ferent corrections (jYoshii and Takaharalll988r ). 

It is, however, worth commenti ng on the derivation o f 
the average evolution corrections bv lBlanton et al.l ( 20031 ). 
These have been derived by assuming that the space den- 
sity is constant (i.e., uniformity), by including the ef- 
fect of large-scale fluctuations in some ad-hoc parame- 
ters of a phenomenological behavior of the luminosity- 
redshift function and by assuming that unknown evolu- 
tionary factors may explain the residual behaviors that 
are not taken into ac c ount by t hose parameters — see 
Eq.5 in lBlanton et all (|2003f) and iLin et all (|l999f) . Thus 
the results for the amount of evolution are based on very 
strong assumptions which are reflected in the following: 
any deviation from uniformity on a large scale, which is 
not properly described by the assumed phenomenologi- 
cal luminosity-redshift function results as a sign of galaxy 
evolution; that is, galaxy evolution corrections were not 
measured in a way that is free of a priori assumptions. 



behavior adopted by other authors (see 
12004 ) to reach, in the same samples we consider, conclu- 
sions that are substantially different from ours. We find 
that the results for the PDF of conditional fluctuations 
are basically unaffected. Although this does not strictly 
imply that evolution is not playing any role, this does 
imply that galaxy fluctuations are not self-averaging and 
that galaxy distribution is not uniform in these samples, 
at least not in the range of scales that we define properly 
below. 

Thus the question remains open of whether some more 
detailed evolutionary corrections can qualitatively change 
the results we get. The basic issue to be considered in this 
respect is that we mainly focus the PDF of conditional 
fluctuations. While the E-corrections may change average 
behaviors as a function of scale, it is unlikely that they 
can produce the large amplitude fluctuations of large spa- 
tial extension that we observe. In what follows we present 
specific tests computing the effect of average evolution cor- 
rections on the relevant statistical quantities we measure. 



Appendix B: Number counts in tlie magnitude 
limited sample 

The advantage in using the magnitude limited sample is 
that one only considers directly observed quantities, i.e., 
vTLr, z,ri^\, without K and E corrections that introduce 
some additional hypotheses about the shape of galaxy 
spectrum and the evolution process. Here we determine 
galaxy counts as a function of the apparent magnitude 
and the redshift distribution in the ML sample, also de- 
termining their typical fluctuations. Given the spread in 
the galaxy luminosity function, it is not straightforward 
to derive precise information on spatial fluctuations and 
their correlations from these measurements. 



B.l. Magnitude counts 

The analysis of galaxy counts as a function of apparent 
magnitude allows us to make an independent estimation, 
from those based on the three-dimensional analysis of fluc- 
tuations in the survey in VL samples. However this anal- 
ysis does not allow us to disentangle luminosity selection 
effects from spatial fluctuations. By studying the variance 
of counts we are only able to estimate real-space fluctua- 
tions indirectly. 

We first divide the angular region of the survey into 
Nf = 20 angular subregions of almost equal solid angle. 
In the i*'' subregion, of solid angle O^, we compute the dif- 
ferential counts of galaxies ni{m, Am) in magnitude bins 
of size Am — 0.25. We then compute the average 



n(m. Am) = — 
^ ' Nf ^ 



ni(m, Am) 



(B.l) 
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Fig.B.l. Average differential number counts of galax- Fig. B. 2. Behavior of cr(m, Am) as a function of apparent 

ies as a function of apparent magnitude in bins of magnitude (i.e., EqEU in bins of size Arrir = 0.25 per 

AfUr — 0.25 per unit solid angle in deg^. The best fit with ^^j^; solid angle in deg^ 
a behavior of type 10"™ is shown for a « 0.57 ± 0.01. 



and we estimate the variance 

Nf 



E^(m, Am) — {ni{m, Am)~n{m, Am))'^ -(2.2) 

Nf — 1 ^ — ' 



The normalized variance is 



a^{m, Am) = 



S2(m, Am) 



(B.3) 



n(m, Am) 

Fig lB.ll shows the average differential number counts of 
galaxies as a function of apparent magnitude, which ag rees 



nicely with the determination bv I Strauss et al 
particular the counts grow as 10""* with a ~ 
in the magnitude range [14,17] 



(20021). In 



0.57 ±0.01 



In Fig |B.2l we present the behavior of (T(m, Am) as 
a function of apparent magnitude. The fast decay of 
i7(m. Am) at bright magnitudes (i.e., m^ < 14) comes 
from the dominance of Poisson noise on the intrinsic vari- 
ance of the distribution. It is in fact simple to show that 
for a perfectly Poisson distribution of galaxies (i.e., with- 
out any spatial correlation), we get 



cr(m) - 10" 



-/3m 



(B.4) 



with P = 0.3. The parameter (3 is in general determined 
by the decay of spatial correlations. For correlated distri- 
butions, the decay is slower than for the Poisson case; i.e., 
/3 < 0.3 but it is not straightforward to relate the param- 
eter f3 to t he exact value of the correlation exponent in 



real-space (|Gabrielli et al.l . 120051 ) 



One may note from Fig lB.2l that, for magnitudes 
fainter than m^ ~ 14, there are fluctuations of ~ 15% 
up to the faintest magnitude limit of the survey, i.e., 
rrir = 17.77: these are thus persistent up to the deepest 
scales observed. This result is not unexpected, because 
for many years relatively large fluctuations have been de- 
tected by different authors in many different catalogs. For 



instance by studying the POSS-II photographic plates, 
fluctuations of ~ 30% in the surface galaxy density were 
obser ved in the mag nitude range between 16.5 — 19 in the r 
filter ( Picardl . 1991 ). although calibration and systematic 
errors could affect the photometric determ inations from 
the photographic plates ( Weir et all . Il995f) . 

Furthermore a deficiency of bright galaxies around 
the south galactic pole was first examined by IShanksl 
(J,990) and then by iMaddox et~al] ([l990) which observed 
a large deficit in the number counts (50% a.t B = 16, 
30% at B = 17) over a 4000 deg^ solid angle. More 
recently in a CCD survey of bright galaxies within the 
Northern and Southern strips of the 2dFGRS conclusive 
evidence was found that there are fluctuations of about 
30% in galaxy counts as a function of apparent magn itude 
(|BussweU et al.l . l2003l ). In addition iRith et all (|2003l ). us- 
ing the bright galaxy counts from the 2 Micron All Sky 
Survey, found results indicating a very large 'local hole' 
in the Southern Galactic Cap (SGC) to > 150 Mpc/h 
with a linear size across the sky of ^ 200 Mpc/h, sug- 
gesting the presence of a potentially huge contiguous void 
stretching from south to north, and indicating the pos- 
sible presence of significant correlations on scales of the 
order of 300 Mpc/h. Similarly, by studying i/-band num- 
ber counts over 0.30 deg^ to H — 19, as well as < 14 
counts from 2MASS, concluded that these counts repre- 
sent a 4.0 sigma fluctuation implying a local hole which 
extends over the entire local galaxy distribution a nd be - 
ing at odds with ACDM predictions iFrith et all (|2006h . 
We investigate in Sec. 4, by using the real-space analysis, 
the relation between these measurements and fluctuations 
in real-space, trying to determine whether the above es- 
timation of the normalized variance is a reliable statis- 
tical measurement of the intrinsic variance of the distri- 
bution or whether there is a systematic effect that may 
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red uce, or enlarge, the fluctuations measured in this way 



(see ISvlos Labini et al.L l2009c: .a.b,) 



It is worth noticing that lYasuda et all (|200l[ ). mea- 
sured bright galaxy number counts in two independent 
stripes of imaging scans along the celestial equator, one to- 
ward the north and the other one toward the south galac- 
tic cap, covering about 230 and 210 square degrees respec- 
tively, from imaging data taken during the commissioning 
phase of SDSS. They find that the counts from the two 
stripes differ by about 30% at magnitudes brighter than 
15.5. Despite the presence of these large fluctuations they 
concluded that the shape of the number counts-magnitude 
relation, brighter than = 16 is characterized by the re- 
lation expected for a homogeneous galaxy distribut i on in a 
"Euclidean" universe (for which a = 0.6) ()Peeblesl . [l980l) . 
This result is probably affected by the small number of 
objects in the bright end of the counts, which indeed does 
not exceed a f e w hun dred galaxies — see Tables 2 and 6 of 
lYasuda et al. (2001). In addition, they notice that in the 
magnitude range 16 < rrir < 21, the galaxy counts from 
both stripes agree very well and follow the prediction of 
the no-evolution model, although the data do not exclude 
a small amount of e volution. This c onclusion thus con- 
trasts with the one bv lLovedav (2004) who, as mentioned, 
instead invokes a substantial amount of galaxy evolution 
to explain the radial counts. 

Moreover ,it should be noticed that, by measuring the 
rms scatter of galaxy number counts in the SDSS-DRl, in 
different parts of the sky after correcting for Galactic ex- 



tinction, iFukugita et al.l (|2004l find that this is consistent 
with what is expected from the angular two-point correla- 
tion function integrated over circular areas. They did not 
analyze the behavior of the rms scatter as a function of 
apparent magnitude, i.e., Eq |B.3[ and their results show 
compatibility of angular correlations with counts fluctua- 
tion, but they do not constraint uniquely spatial correla- 
tions. Indeed angular correlations may be de generate with 



respect to three-dimensional properti es (see lDurrer et al 



1997t iMontuori fc Svlos Labinil . I1991T ). 



B.2. Redshift distribution 

The analysis of the counts of galaxies as a function of 
redshift in the full magnitude-limited survey is a comple- 
mentary study to the counts as a function of apparent 
magnitude. As in the former case, it is difhcult to extract 
a clear information about correlation properties of galaxy 
distribution. However, analysis of the redshift distribution 
of galaxies in different regions on the sky is an useful in- 
strument for getting a first qualitative information about 
the position, sizes and amplitudes of the spatial galaxy 
number fluctuations. 

For instance, by studying the redshift distribution in 
the Durham/UKST Galaxy Redshift Survey, fluctuations 
were found in the observed radial density function of 50% 
occurrmg on ^ 50 M pc/h scales (jRatcliffe et al.l . Il995 
Buss well et al.L 120031 ). In a similar way in the 2dFGRS 



a □ R2 
» -» R3 



60 
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Fig. B.3. Differential number counts as a function of red- 
shift, in bins of Az = 0.01, for unit solid angle, in the 3 
angular regions Rl, R2 and R3. 



two clear "holes" in the galaxy distribution were detected 
in the ranges 0.03 < z < 0.055, with an under-density 
of ^ 40%, and 0.06 < z < 0.1 where t h e den sity de- 
ficiency is about ~ 25% ( Busswell et al. . l2003l) . These 



two under-densities, detected in particular in the 2dFGRS 
southern galactic cap (SGC), are also clear features in the 
Durham/UKST survey. Given that the 2dFGRS SGC field 
is entirely contained within the areas of sky observed for 
the Durham/UKST survey the similarities in the redshift 
distributions are both evidence of the same features in the 
galaxy distribution. 

In Fig |B.3l we show the differential number counts, in 
bins of Az — 0.01 for unit solid angle, as a function of 
redshift in the three angular regions Rl, R2, and R3. 
Although the three angular regions cover different solid 
angles (in particular Rl has a solid angle six times larger 
than R2 and R3), it is interesting to note that in R3 there 
is a very large fluctuation which, as we discu ss in Sect. 2 , 



corre sponds to the famous SDSS Great Wall (jGott el al 



20051 ). Other structures of smaller amplitude are visible in 
R2 and R3, and we present a more detailed analysis be- 
low. A part the fluctuations, the behavior of the counts as 
a function of redshift involve a convolution with the lumi- 
nosity selection of the survey. Thus it generally displays 
asymmetric bell-shaped behavior, where the peak corre- 
sponds to t he maximum of the lu minosity selection of the 
survey (see iBusswell etall . l2003l ) . 

In Fig |B.4l we show the average differential number 
counts, in bins of size Az = 0.01, for unit solid angle. This 
is computed similarly to the average counts as a function 
of apparent magnitude described above. We divide the an- 
gular sky region of the survey into Nf — 20 independent 
and nonoverlapping angular regions (the i^^ angular re- 
gion has solid angle fli). We then compute 



n(z, Az) 



^ ni[z, Az) 



Nf ^ 



(B.5) 
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Fig. B.4. Average differential number counts, in redshift 
bins of size Az = 0.01, for unit of solid angle, as a function 
of redshift. 

where ni{z,Az) represents the counts in the i^^ sky re- 
gion. In Fig lB.4l we report the average differential num- 
ber counts, in bins of Az = 0.01, for unit of solid an- 
gle, as a function of redshift, where the fluctuations again 
trace large-scale structures and th e peak at z ^ 0.07 cor- 
responds to the SDSS Great Wall (jCott el alLliooi) . 
The redshift counts variance is given by 

Nf 

Az) = j^—j Y.(^^{z, Az) ~ n{z, Az)Y . (B.6) 

The normalized variance is thus 



a'^{z,Az) 



Y?{z,Az) 

2 



(B.7) 



In general the variance for a point distribution is the sum 
of the intrinsic variance due to correlations and to Poisson 
noise. Here we subtract the Poisson term, so we only con- 
sider the intrinsic variance due to correlations. In Fig lB.51 
we present the normalized (intrinsic) standard deviation 
for different choices of Az. When the redshift bin is in- 
creased to Az = 0.05 (which corresponds to AR « 150 
Mpc/h) fluctuations are still of ~ 15%, and they persist 
at the different scales sampled by the survey, in agree- 
ment with the results obtained by the apparent magnitude 
counts analysis anclwith the analysi s in other galaxy red- 
shift surveys (iRatcliffe et all . Il998l : iBusswell et all . I2OO3I: 
Svlos Labini et al.l . l2009bf) . 



Appendix C: The luminosity function 

One of the main problem in the study of galaxy struc- 
tures is to disentangle spatial properties of galaxies from 
their luminosity distribution. Thus an important quantity 
to be determined is the galaxy luminosity function 0(L) 
and the quantity (j){L)dL provides the probability that a 
galaxy has luminosity L in the range dL. In general an 



Fig. B.5. Standard deviation of the differential number 
counts, in redshift bins of size Az = 0.01,0.03,0.05, for 
unit of solid angle, as a function of redshift. Poisson noise 
has been subtracted so this only contains the contribution 
due to galaxy correlations. 



assumption is made that the ensemble average number of 
galaxies for unit volume and unit luminosity can be writ- 
ten as 



(Kr-,L)) = (n(r))x(0(L)) 



(C.l) 



where {n{r)) is the ensemble average density and {(f>{L)) 
the ensemble average luminosity function. This implies 
the independence between space and luminosity distribu- 
tions, i.e., that galaxy positions are independent of their 
luminosities. Although there is clear evidence of a cor- 
relation between them (as for instance the brightest el- 
liptical galaxies are found in the center of rich galaxy 
clusters) it has been tested that this is nevertheless a 
reasonable assumption in the g alaxy catalogs available 
so far (see iGabrielli et al.l 120051 ). To go beyond this as- 



sumption one should use the multi- fractal formalism as in 
Svlos Labini and Pietronero (Il996l) . 

An additional, much stronger, assumption often 
adopted is that the space density is a constant, i.e., 
(n(r)) = const. This assumption is for instance at the 
basi s of the so-called standard minimum variance estima- 
tor ('Davi s fc Huchral . ll982l : iBlanton et al.l . l2003t iLovedavL 
[2004). It is clear that we want to avoid making this further 
assumption because we want to test whether the space 
density is (or can be approximated by) a simple constant. 
It is also evident that if this assumption is inconsistent 
with the sample data properties, all results derived from 
methods encoding it are intrinsically biased. 

To determine the shape of the luminosity function, 
the so-called inho mogeneity - indep e ndent method i s com- 



monly employed ( Lovedav . 2004; iBlanton et al.l . 2003f l 
which uses a modified version of Ea lC.li namely that 



iy{R,L) = n{R) x 0(L) 



(C.2) 
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Fig. C.l. Luminosity function in the SDSS K-corrected 
catalog and its best-fit estimation with a Schecther func- 
tion. 



where n{R) is the density as function of the radial (metric) 
distance R and the luminosity function. This can be 
a useful working hypothesis. Under this approximation, in 
a VL sample the luminosity function can be written as 

4){L) j^-''^'' n{R)VLR^dR 
= (C.3) 

where N is the total number of galaxies in the VL sample 
and r2 its solid angle. In this way, even when n{R) is highly 
fluctuating, one may recover the shape of 4){L) as spatial 
inhomogeneities cancel out in the ratio given in Eq. IC.3I 
Thus by making the normalized histogram of the number 
of galaxies in luminosity bins in each VL sample, we get 
<j)^^{L). Then we look for the best fit in all the VL samples 
with the Schechter function ( Schecther . 1976f l 



= A X i"exp(-L/L*) . (C.4) 

For this determination we used other VL than those listed 
in Table [21 namely, we constructed VL samples each with 
only one magnitude in range. We then find (see Fig lC.lj) 
in the K-corrected catalog with no E-corrections that the 
best-fit parameters are a = 1.22±0.02 and M* = -20.63± 
0.02, in good ag reement with previous determinations (see 



Lovedavl . 120041 ) 



To conclude this discussion we note that while the 
effect of inhomogeneities is fairly taken into account in 
Egs lC.lI IC.21 the amplitude of the luminosity function is 
usually estimated under the assumption that this is a con- 
stant proportional to the average density. We have seen 
that this situation cannot be satisfied in the data; i.e., 
when n(r) has a clear scale dependence, the amplitude 
of the luminosity function gives a systematically biased 
estimation of the average density. 



Sylos Labini, Vasilyev, Baryshev: Breaking of self-averaging in SDSS-DR6 31 




300 



R(Mpc/h) R(Mpc/h) 




R(Mpc/h) R(Mpc/h) 



Fig. 10. Behavior of the local average of N{r; R, AR) (see Eqs fT4llT5l) normalized to the whole sample average (see 
EgfTOl below) in bins of thickness AR —10 Mpc/h for sphere radius r — 10 Mpc/h normalized to the whole sample 
average for the 5 VL samples with K-correction (K), with evolution and K-correction (E+K), and without evolution and 
K-correction (KO). The insert panel shows the number of centers, over which the average and variance are computed 
in each AR bin. In the bottom right panel we report the behavior of N{r; R, AR) in bins of thickness AR = 10 Mpc/h 
for r = 20 Mpc/h, normalized to the luminosity factors as explained in the text (see Sect l4.6p . for K-corrected VL 
samples. 




Fig. 15. Three-dimensional representation of the SL analysis with r = 10 Mpc/h for R3VL2. The x,z coordinates 
of the sphere center define the bottom plane, and on the vertical axis we display the intensity of the structures, the 
conditional number of galaxies Ni(r) contained in the sphere of radius r. 




Fig. 20. PDF of conditional fluctuations in the mock RlVLl, R1VL3, and R1VL5 samples in real (red line) and 
redshift-space (black line). (Each row corresponds to a VL sample; the scale r is reported in the caption). The best-fit 
Gaussian function (green line) is reported. 



